Kafka Flashcards
What is Kafka?
Kafka is a pub/sub messsaging system. It can also be used as an event streaming platform.
What are the 6 main Kafka components?
Draw a diagram
- Message: key-value pair and additional metadata
- Producer: client that produces messages to a topic
- Topic: category for messages
- Partition: commit log
- Broker: server
- Consumer: client that consumes messages from a topic
What are the 5 main components of a message?
- Key: byte[]
- Value: byte[]
- Offset: int
- Timestamp: long
- Header: byte[]
Although Kafka does not require a data format for the content of its messages, why is it important to declare one?
Declaring a data format decouples messages from the producer and consumer. This can be done by defining and storing a schema in a shared repository. This way, the producer and consumer get to use the messages without direct coordiation.
What is the purpose of a message key?
The purpose of a message key is to provide a way to send messages to a specific partition
By default, are messages sent to Kafka in batches or one at a time?
By default, messages are sent to kafka in batches. This can reduce network overhead.
Describe the flow of a message starting with the producer and ending with the consumer
The producer serializes the message and then uses a partitioner to decide which partition the message will be sent to. Under default settings, the partitioner will build up a batch of messages until they are sent to partitions in the Kafka cluster. The consumer continuously polls the partitions and returns a batch of messages. These messages are deserialized and then processed by the consumer.
Why are partitions important?
Partitions are important because they provide a way for replication and parallel processing. This is because partitions can be distributed and replicated across separate brokers
Do partitions guarantee order at the partition or topic level?
Partitions guarantee order at the partition level not the topic level. If you need order at the topic level, use a single partition for that topic
What is an offset?
An offset is an integer that points to the location of a message in a partition. The offset is normally generated by Kafka and consumers commit them after processing the messages returned by poll()
Does a producer balance messages over all partitions of a topic evenly by default?
partitioner.class
Determines which partition to send a record to when records are produced. Available options are:
If not set, the default partitioning logic is used. This strategy send records to a partition until at least batch.size bytes is produced to the partition. It works with the strategy:
1) If no partition is specified but a key is present, choose a partition based on a hash of the key.
2) If no partition or key is present, choose the sticky partition that changes when at least batch.size bytes are produced to the partition.
org.apache.kafka.clients.producer.RoundRobinPartitioner: A partitioning strategy where each record in a series of consecutive records is sent to a different partition, regardless of whether the ‘key’ is provided or not, until partitions run out and the process starts over again. Note: There’s a known issue that will cause uneven distribution when a new batch is created. See KAFKA-9965 for more detail.
Implementing the org.apache.kafka.clients.producer.Partitioner interface allows you to plug in a custom partitioner.
Can a partition be consumed by more than one instance of a consumer group?
No. While an instance of a consumer group can consume messages from multiple partitions, a partition can only be consumed by a single instance of a consumer group
What is the rule of thumb when deciding how many partitions to declare?
Declaring as many partitions as there are brokers in your cluster. This will evenly distribute the message load
What is the primary purpose of the Admin Client
?
The primary purpose of the Admin Client is to configure and manage Kafka topics and brokers
What is disk throughput?
Disk throughput is the average amount of data a storage device can read or write per unit of time