Apache Kafka Flashcards
What is Apache Kafka?
Apache Kafka is a popular distributed event streaming platform designed to efficiently manage real-time data feeds.
Define [Distributed] in relation to an Event Streaming Platform.
Distributed refers to Kafka running on multiple servers (a cluster) instead of a single machine. Allows for High Availability and Scalability.
Define Event Streaming.
Event: A record of something that happened (“user clicked a button”)
Streaming: Events are continuously produced, processed, and consumed in real time.
Kafka ingests, stores, and processes these events efficiently, making it ideal for:
Real-time analytics (tracking user activity)
Message Queuing (connecting microservices)
Log collection (system monitoring)
What is Message Queuing?
Message Queuing is a method where microservices communicate asynchronously by sending and receiving messages through a queue.
A message broker (Apache Kafka, RabbitMQ, AWS SQS) acts as an intermediary, ensuring messages are delivered reliably between services without direct dependencies.
Explain how Message Queuing works.
- A Producer (Sender) Service creates a message (“Ordered Placed: OrderID 1234”). Sends it to a message queue (Kafka Topic).
- Message Broker (Queue System). Stores message temporarily until a consumer retrieves them. Ensures reliable delivery, even if the consumer is offline.
- Consumer (Receiver) Service. Subscribes to a queue (Kafka Topic). Processes the message when it’s available (“Prepare shipping for OrderID 1234”)
Advantages of Kafka? (4)
- Real-Time Data Processing: Kafka allows for the processing of real-time data streams, enabling businesses to make decisions quickly.
- Scalability: Kafka is highly scalable and can manage a large volume of data without impacting performance.
- Fault Tolerance: Kafka is fault-tolerant, ensuring the data is not lost even in case of hardware failure.
- High Throughput: Kafka can process a large amount of data with low latency, making it suitable for applications that require real-time processing.
Define Kafka Clusters.
Kafka Clusters are distributed systems that consist of multiple Kafka brokers working together to handle and process real-time data streams.
Define Brokers.
Brokers are the core of the Kafka cluster. They receive messages from producers, store them in partitions, and deliver them to consumers.
Define Topics.
Topics are the channels through which data is organized and categorized. They can be divided into multiple partitions for better scalability and performance.
Define Partition.
Partitions are the fundamental unit of data storage in Kafka. Topics that are divided into multiple partitions are distributed across the brokers in the cluster.
Define Producers.
Producers help in publishing data to Kafka topics. They send messages to specific topics within the Kafka cluster.
Define Consumers.
Consumers subscribe to topics and receive messages from them. They can process the received messages, store them, or perform other actions.
Define Offsets.
Offsets are unique identifiers that represent the position of a message within a specific partition of a topic. They are crucial for tracking the progress of consumers within a topic.