Kafka Flashcards

Question 1

Q

What is Kafka?

Answer

A

It is a distributed event streaming platform. Event streaming can be used for example to process payments and transactions in real-time as they happen. It has three key capabilities:

Publish (write) and Subscribe (read) stream of events
Store streams of events durably and reliably for as long as you want
Process streams of events as they occur

Question 2

Q

How does Kafka work?

Answer

A

It is a distributed system consisting of servers (brokers) and clients that communicate over the TCP network protocol. It follows the Pub/Sub pattern where the Producers publishes (write) events to Kafka and Consumers subscribe to (read and process) these events.

Events are organized and stored in topics which is similar to a folder in a filesystem, and the events are the files in that folder. Topics are partitioned, meaning a topic is spread over different Kafka brokers. Each topic can be replicated.

Question 3

Q

What are Kafka delivery semantics?

Answer

A

At most once: Message pulled once. May or may not be received. No duplicates. Possible missing data.
At least once: Message pulled once or more times; processed each time. Receipt guaranteed. Likely duplicates. No missing data.
Exactly once: Message pulled once or more times; processed once. Receipt guaranteed. No duplicates. No missing data.

Question 4

Q

What are Kafka’s Core APIs?

Answer

A

Producer API: Publish a stream of records to a topic
Consumer API: Subscribe to a topic
Connect API: Connect Kafka topics to existing applications such as a database
Streams API: Stream processor. Transforming input streams from one topic into output streams to another topic

Question 5

Q

How is load balancing accomplished in Kafka?

Answer

A

Producer load balance is done by producing to multiple brokers which is done automatically.
Consumer load balance can be done by using consumer groups.

Question 6

Q

What is the offset used for, in Kafka?

Answer

A

It is used to control the read progress. Producers always append to “tail” like appending to a file.

Question 7

Q

Message Queue vs Pub/Sub

Answer

A

Ordering of Messages: There’s no ordering in message in RabbitMQ whereas Kafka enables ordering using partitions.
Lifetime of Messages: In RabbitMQ, messages are cleared once consumed and the acknowledgement is sent. Kafka messages are always present and have a message retention policy.
Prioritizing the Messages: Messages can be consumed according to their priority. That’s not possible in Kafka.

Question 8

Q

What are the major components of Kafka?

Answer

A

Topic: It is a category or feed in which records are saved and published.
Producer: Publishes messages to a topic.
Consumer: Reads message from a topic. Consumers will be divided into groups. Each consumer in a consumer group will be responsible for reading a subset of the partitions.
Broker: It is a server that works as part of a Kafka cluster.

Question 9

Q

What do you mean by a partition in Kafka?

Answer

A

Kafka topics are separated into partitions, each of which contains records in a fixed order. Topics can be parallelized via partitions, which split data into a single topic among numerous brokers.

Partition allows a single topic to be partitioned across numerous servers. This allows you to store more data in single topic than a single serve can.

Question 10

Q

How is replication done in Kafka?

Answer

A

Using partitions. A replica is a redundant element of a topic partition. Each partition often contains one or more replicates, which means that partitions contain messages that are duplicated across many Kafka brokers in the cluster.

Question 11

Q

What is the use-case of Zookeeper in Kafka?

Answer

A

It keeps track of the Kafka cluster nodes status, as well as Kafka topics, partitions and so on. It is used by Kafka brokers to maintain and coordinate the cluster.

Kafka can be used without Zookeeper as of version 2.8, though it is not yet ready for production.

Question 12

Q

Explain the concept of Leader and Follower in Kafka?

Answer

A

Each partition has one server that acts as a Leader and one or more servers that operate as Followers. The Leader is in charge of all read/write operations for the partition, while the Followers are responsible for passively replicates the leader.

Question 13

Q

Why is topic replication important in Kafka? What do you mean by ISR in Kafka?

Answer

A

When one broker fails, topics replicas on other brokers remain available to ensure that data is not lost and that the application is not disrupted. The replication factor specifies the number of copies of a topic that are kept across the Kafka cluster.

It takes place at the partition level. A replication factor of two, for example, will keep two copies of a topic for each partition.

An In-Sync Replica (ISR) is a replica that is up to date with the partition’s leader.

Question 14

Q

What do you mean by Kafka schema registry?

Answer

A

It is used to ensure that the schema used by the consumer and the schema used by the producer are identical.

Question 15

Q

Describe partitioning key in Kafka

Answer

A

Each record (message) has a key and a value, with the key being optional. For record partitioning, the record’s key is used. There will be one or more partitions for each topic. Partitioning is data structure. It’s the append-only sequence of records, which is arranged chronologically by the time they were attached. Once a record is written to a partition, it is given an offset which is a sequential id that reflects the record’s position and uniquely identifies it.

Partitioning is done using the record’s key. Kafka producer uses the record’s key to determine which partition the record should be written to.