Kafka Flashcards

These flashcards were created from [Hello Interview's Kafka Deep Dive](https://www.hellointerview.com/learn/system-design/deep-dives/kafka)

1
Q

Kafka

What is a broker?

A
  • A broker is a physical or virtual server in the Kafka cluster
  • Each broker is responsible for storing data and serving clients
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kafka

What is a partition?

A
  • A partition is an ordered, immutable sequence of messages that is continually appended to.
  • Think of partitions as a log file
  • They are why Kafka can scale as they allow message to be consumed in parallel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kafka

What is a topic?

A
  • A topic is a logical grouping of partitions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kafka

What is the difference between a partition and a topic?

A
  • A topic is a logical group of messages
  • A partition is a physical grouping of messages
  • A topic can have multiple partitions and each partition can be on a different broker
  • Topics are a way to organize data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Kafka

What is a producer?

A
  • Producers write data to topics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Kafka

What is a consumer?

A
  • Consumers read data from topics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kafka

What are the two main use cases for Kafka?

A
  • Message queue
  • Steam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Kafka

What is a message/record?

A
  • A message is the data structure stored in partitions
  • Messages consist of 4 fields
    • headers
    • key
    • value (required)
    • timestamp
  • The key is used to which partition a message is sent to, if absent, it will be randomly assigned to a partition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kafka

How does Kafka determine which partition to write a message to?

A
  1. Partition Determination: If a message key is present, the key is hashed and assigned to a partition; if no key is present, a round-robin algorithm will be used or another algorithm defined in the producer configuration.
  2. Broker Assignment: After the partition is identified, the Kafka cluster metadata contains a mapping of partitions to specific brokers. The Kafka controller holds this information and the producer can use this to route the message to the proper partition.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kafka

What are some important benefits of partitions acting as append-only log files?

A
  1. Immutability: messages cannot be altered or deleted once written. This simplifies replication
  2. Efficiency: By restricting to append only at the end of files, this minimizes seek time
  3. Scalability: An append-only log mechanism enables horizontal scaling to handle increasing load. Partitions can be replicated to enhance fault tolerance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kafka

Describe the leader-follower model for replication.

A
  1. Leader Replica Assignment: For each partition, a leader replica is assigned. The leader replica handles all of the reads and writes for the partition. The cluster controller handles assignment of the leader replica and ensures that leader replicas are evenly distributed across the cluster.
  2. Follower Replication: Several follower replicas exist, for each partition, residing on different brokers. Follower replicas do not handle reading or writing, they passively replicate data from the leader replica.
  3. Synchronization and Consistency: Followers continually sync with the leader replicas. If a leader fails, a fully synced follower can be promoted to the leader.
  4. Controller’s Role in Replication: The controller manages the replication process. When a leader fails, it promotes an in-sync follower to ensure continued availability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Kafka

What is a push model

A

When a consumer is subscribed to a topic, new messages are sent to the consumer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kafka

What is a pull model

A

A consumer can poll the topic, at regular intervals, to get the latest messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kafka

When to use Kafka in your interview

A
  • Message queue
    • A process that can be done asynchronously: YouTube processing videos
    • To ensure messages are processed in order: Ticketmaster virtual queue
    • Decouple producer and consumer so they scale independently: This means that the producer is producing messages faster than the consumer can consume them
  • Streams
    • Continuous and immediate processing of incoming data, treat it as a real time flow: And Ad Clicker
    • Messages need to be processed by multiple consumers simultaneously: FB live comments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly