Apache Kafka Flashcards
What is Apache Kafka?
An open source message broker developed by LinkedIn and written mainly in Scala.
What is Kafka’s purpose?
Kafka facilitates distributed messaging between applications, reducing coupling and redundant code.
What is the need for Kafka?
In systems with multiple applications (e.g., inventory, customer care, and review checker), direct communication leads to tight coupling and redundant code.
Kafka acts as a message broker, decoupling these applications and simplifying communication.
What is the difference between Kafka and other message brokers?
Kafka uses the “dumb broker, smart consumer” model and integrates with external tools like Zookeeper for distributed management.
Kafka also supports high message throughput (100,000 messages per second) compared to others (20,000 messages per second).
What is a producer?
Sends message to Kafka.
What is a consumer?
Reads messages from Kafka.
What is a broker?
Stores and routes messages
What is a topic?
Logical grouping for messages.
What are partitions?
Kafka topics that have been divided to distribute load. Each partition can be on different nodes.
What is a consumer group?
Multiple consumers can belong to the same group, and each group reads from unique partitions for load distribution.
What is a producer failure?
If a producer fails, messages remain in Kafka until consumers fetch them.
What is a consumer failure?
If a consumer fails, partitions assigned to it can be reassigned to other consumers in the same group.
What is replication?
Kafka replicated partitions (typically a factor of 3) to ensure data availability and fault tolerance.
What are the indexing options for commit management and what do they do?
Auto Commit: offsets are committed automatically at regular intervals. Suitable for lower data integrity requirements.
Sync Commit: Offset is committed synchronously, ensuring high accuracy but lower speed.
Async Commit: Offset is committed asynchronously, balancing speed and reliability.
What is the leader-follower model?
A fault tolerance that makes it so each partition has a leader, responsible for data writes, and followers replicate data from the leader.