Kafka Consumer Flashcards
What is a consumer?
A consumer is any application which consumes data from Kafka topic by subscribing to it.
Can Kafka consumers subscribe to multiple topics?
Yes a single Kafka consumer can subscribe to multiple topics.
Why do we need Consumer groups?
Suppose producer is producing too much data and single consumer is not able to keep up with the demand to process all data, we spin up multiple consumers and keep them in same logical group called Consumer Group. Consumer group share the same offset id. So once a record has been read by one consumer, other cosumer in the same consumer group will not get that message. This allows sharing of load.
Do consumers in Consumer Group share offset?
Yes, consumers in same consumer group share offset id.
How many consumers can consume from a partition?
Only one consumer can consume from one partition.
What is the relation between number of partitions and number of consumers?
The number of consumers can be either equal to or less than number of partitions. So only one consumer will be updating the offset metadata at a time. One consumer may be handling either one or more than one partitions at a time. But two consumers cannot handle single partition.
At which level is the metadata about the offset maintained? Consumer or Consumer Group level?
The metadata about the offset is maintained at the Consumer group level, so if same message is to be consumed by multiple consumers then they need to be part of different consumer groups.
Which property defines the consumer group that the consumer will belong to?
“group.id” property defines the group that the consumer will join.
What will happen if you only have few consumers consuming from multiple partitions?
There will be lag in processing of messages, as the consumers cannot keep up with the rate at which messages are being published.
Is it possible to give partition number to consumer when starting?
Yes it is possible to give consumer partition number with topic.
What will happen if you have a topic with one partition only and the consumer group has three consumers with same consumer group name?
In that case, only one consumer will be able to consume the data, other consumers will not get data.
What is the best practice for number of consumers in consumer group supposing that the topic has n partitions?
It is a best practice to have n + 1 or n + 2 consumers as they will act as failovers when any consumer goes down.
What is rebalancing in Kafka and why is it required?
Moving ownership of a parition from one consumer to another is called rebalance.
Whenever a new consumer joins the group or a consumer goes dead, new partition is added, then Kafka will try to balance the load among all the consumers. This process of redistributing the load is called rebalancing.
When can rebalancing occur?
1) A new consumer is added
2) Consumer crashes or is DEAD
3) Consumer is not sending heartbeat response due to doing some heavy processing, it is deemed as logically DEAD
4) A new partition is added
What affect does existing consumers have when rebalance occurs?
Consumers loose their current state, ie. which partition is currently assigned to them and such. During rebalance, no messages will be processed from the partitions that owned by the dead consumer.
How does heartbeat work in Kafka, who monitors the heartbeat?
Consumers connect to group co-ordinators and send heart beat on some regular intervals. If consumer stops sending heartbeats for long enough, its session will timeout and the group co-ordinator will consider it dead and trigger a rebalance.
What is the relation between Group Co-ordinator and topic?
For every topic a Group co-ordinator is assigned.
What is consumer leader?
The first consumer that joins the consumer group becomes the consumer leader for that group.
What is a JoinGroup request?
When a consumer wants to join a group, it sends a JoinGroup request to the group co-ordinator.
What are the responsibilities of consumer leader?
1) The consumer leader receives list of all consumers in the consumer group from group co-ordinator.
2) Leader is responsible for assigning a subset of partitions to each consumer.
3) After deciding the partition assignment the leader will send all this information to group co-ordinator
4) Group co-ordinator sends this information to all the consumers.
This process is repeated every time rebalance is triggered.
What is use of interface PartitionAssignor?
Consumer leader uses implementation of PartitionAssignor interface to decide which partitions should be handled by which consumer.