Apache Kafka Flashcards

Question 1

Q

What is Apache Kafka?

Answer

A

Apache Kafka is a popular distributed event streaming platform designed to efficiently manage real-time data feeds.

Question 2

Q

Define [Distributed] in relation to an Event Streaming Platform.

Answer

A

Distributed refers to Kafka running on multiple servers (a cluster) instead of a single machine. Allows for High Availability and Scalability.

Question 3

Q

Define Event Streaming.

Answer

A

Event: A record of something that happened (“user clicked a button”)

Streaming: Events are continuously produced, processed, and consumed in real time.

Kafka ingests, stores, and processes these events efficiently, making it ideal for:

Real-time analytics (tracking user activity)
Message Queuing (connecting microservices)
Log collection (system monitoring)

Question 4

Q

What is Message Queuing?

Answer

A

Message Queuing is a method where microservices communicate asynchronously by sending and receiving messages through a queue.

A message broker (Apache Kafka, RabbitMQ, AWS SQS) acts as an intermediary, ensuring messages are delivered reliably between services without direct dependencies.

Question 5

Q

Explain how Message Queuing works.

Answer

A

A Producer (Sender) Service creates a message (“Ordered Placed: OrderID 1234”). Sends it to a message queue (Kafka Topic).
Message Broker (Queue System). Stores message temporarily until a consumer retrieves them. Ensures reliable delivery, even if the consumer is offline.
Consumer (Receiver) Service. Subscribes to a queue (Kafka Topic). Processes the message when it’s available (“Prepare shipping for OrderID 1234”)

Question 6

Q

Advantages of Kafka? (4)

Answer

A

Real-Time Data Processing: Kafka allows for the processing of real-time data streams, enabling businesses to make decisions quickly.
Scalability: Kafka is highly scalable and can manage a large volume of data without impacting performance.
Fault Tolerance: Kafka is fault-tolerant, ensuring the data is not lost even in case of hardware failure.
High Throughput: Kafka can process a large amount of data with low latency, making it suitable for applications that require real-time processing.

Question 7

Q

Define Kafka Clusters.

Answer

A

Kafka Clusters are distributed systems that consist of multiple Kafka brokers working together to handle and process real-time data streams.

Question 8

Q

Define Brokers.

Answer

A

Brokers are the core of the Kafka cluster. They receive messages from producers, store them in partitions, and deliver them to consumers.

Question 9

Q

Define Topics.

Answer

A

Topics are the channels through which data is organized and categorized. They can be divided into multiple partitions for better scalability and performance.

Question 10

Q

Define Partition.

Answer

A

Partitions are the fundamental unit of data storage in Kafka. Topics that are divided into multiple partitions are distributed across the brokers in the cluster.

Question 11

Q

Define Producers.

Answer

A

Producers help in publishing data to Kafka topics. They send messages to specific topics within the Kafka cluster.

Question 12

Q

Define Consumers.

Answer

A

Consumers subscribe to topics and receive messages from them. They can process the received messages, store them, or perform other actions.

Question 13

Q

Define Offsets.

Answer

A

Offsets are unique identifiers that represent the position of a message within a specific partition of a topic. They are crucial for tracking the progress of consumers within a topic.

Question 14

Q

Question 15

Q

Apache Kafka Flashcards

(15 cards)