Kafka Internals Flashcards
What does Kafka use to keep track of all the brokers in the cluster?
Kafka uses Zookeeper to keep track of all the brokers that are part of cluster.
If there are two zookeeper then there are how many clusters?
Two. We have one cluster per zookeeper.
At which path of Zookeeper are brokers stored?
Brokers are stored in /brokers/ids path on Zookeeper.
It is a list of Ephemeral node. This is path where producers and consumers subscribe to receive notifications regarding broker.
Process of choosing controller of the cluster?
1) Whenever a broker comes up it first registers itself to the Zookeeper to be part of cluster at /brokers/ids path.
2) Once done that, every broker tries to register itself as the controller of the cluster. If that broker is the first broker in the cluster then it will become the controller of the group.
What happens if one broker with an id is registered and a duplicate broker comes up with same id?
The other broker will try to register itself with Zookeeper with the same id, where it will fail. So there can only be one broker with particular id in the cluster.
What happens if an existing broker in the cluster goes down?
If the broker goes down then the Ephemeral node created in /brokers/ids is also removed.
What happens if an existing broker with id 0 goes down or crashes and other new broker comes up with same id 0?
Because the old broker with that id is not running, then the new broker will take that id and will register itself with cluster successfully.
All the partitions and topics which were held by broker 0, all those things will be given to new broker with id 0.
What are responsibilities of controller of cluster?
Apart from usual broker responsibilities, the controllers is responsible for electing partition leaders.
Which ephemeral node path is used in Zookeeper to store the controller?
/controller node is used to store controller of the cluster.
How new controller of the cluster is chosen when the controller goes down?
All the brokers watch the /controller path and as soon as the controller goes down, all the brokers will get notification and they will try to become the controllers of the group.
Where do all the read and the write requests go for a topic partition?
The leader of the partition.
What is ISR (In-Sync Replica) in Kafka?
All the replicas which have caught up with the Leader are called in-sync replicas. Even a follower with single message behind is out of sync and not considered as in-sync replica.
So when the leader of a broker (partition) goes down, any in sync replica can take it’s place because it has all the latest messages.
When can replicas go out of sync?
Suppose a follower crashes and then comes back up, or due to some reason the replica server is not performing well or cannot cope up with the speed of leader, it will fall back from time to time. When it falls back, it is called out of sync replica. Out of sync replicas cannot take place of a leader of partition as they dont have complete messages.
What are the responsibilities of leader?
Apart from serving reads and writes, the leader is responsible for knowing which of the follower replicas is in-sync or up to date with leader.
Followers attempt to stay up to date with leader by replicating all the messages from the leader as the messages arrive, but they can fail to stay in sync.
Why do all the read and write requests go through the leader?
To guarantee consistency.
What is replica.lag.time.max.ms configuration property?
The amount of time follower can be inactive or behind before it is considered to be out of sync.
What is Preferred leader?
1) Each partition has a preferred leader, the replica that was leader when the partition was first created.
2) It is preferred because when the partitions are created , the leaders are balanced between brokers.
3) As a result, we expect that when the preferred leader is indeed the leader for all partitions in the cluster. In that way the load will be distributed evenly between all brokers.
4) auto.leader.balance.enabled=true In case of failure of leader if the preferred leader replica is in-sync then it triggers leader election to make the preferred leader the current leader.
What does auto.leader.balance.enabled configuration property do?
auto.leader.balance.enabled=true In case of failure of leader if the preferred leader replica is in-sync then it triggers leader election to make the preferred leader the current leader.
Who elects the preferred leader?
Kafka Controller
Request Format of Kafka Produce or Consume request
Request Header Request Type Correlation Id Request Version Client ID Data
Who initiates connections in Kafka?
Clients always initiates and sends requests, and the broker processes the requests and responds to them.
What is correlation id in kafka request header?
It is a unique number that identifies the request and is also found in error logs, which helps during troubleshooting.
What is request version in Kafka request header?
It is provided to brokers for handling clients of different versions and responding correspondingly.
What is use of client id in request header?
Used to identify the client that sent the request. This information is useful for troubleshooting.