Deep Dive - Kafka Flashcards

1
Q

What are the two ways Kafka can be used?

A
  1. message queue
  2. realtime stream.

The key difference between the two lies in how consumers interact with the data. In a message queue, consumers typically pull messages from the queue when they are ready to process them. In a stream, consumers continuously consume and process messages as they arrive in real-time, similar to drinking from a flowing river.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give examples of when to use Kafka as a message queue

A

1) Processing that can be done asynchronously. YouTube is a good example of this. When users upload a video we can make the standard definition video available immediately and then put the video (via link) a Kafka topic to be transcoded when the system has time.

2) Need to ensure that messages are processed in order. We could use Kafka for our virtual waiting queue in Design Ticketmaster which is meant to ensure that users are let into the booking page in the order they arrived.

3) You want to decouple the producer and consumer so that they can scale independently. Usually this means that the producer is producing messages faster than the consumer can consume them. This is a common pattern in microservices where you want to ensure that one service can’t take down another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give examples of when to use Kafka as a stream processing platform

A

1) You require continuous and immediate processing of incoming data, treating it as a real-time flow. See Design an Ad Click Aggregator for an example where we aggregate click data in real-time.

2) Messages need to be processed by multiple consumers simultaneously. In Design FB Live Comments we can use Kafka as a pub/sub system to send comments to multiple consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Kafka ensure reliability and fault tolerance?

A

Kafka ensures data durability through its replication mechanism. Each partition is replicated across multiple brokers, with one broker acting as the leader and others as followers. When a producer sends a message, it is written to the leader and then replicated to the followers. This ensures that even if a broker fails, the data remains available. Producer acknowledgments (acks setting) play a crucial role here. Setting acks=all ensures that the message is acknowledged only when all replicas have received it, guaranteeing maximum durability.

A replication factor of 3 is common, meaning that each partition has 2 replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly