Fundamentals Flashcards
Basic about Kafka, vocabulary
What’s a topic?
streams of collection of related messages or related events.
Can think of it as a sequence of events.
What are partitions?
Topics divided into smaller pieces, then you can allocate the different partitions to different brokers in the cluster (key to how Kafka scales)
What are the main responsibilities of a Broker?
Broker receive the data from producers and store it temporarily in the page cache, or permanently on disk after the OS flushes the page cache.
How long the data is kept around is determined by the retention time (default 1 week)
How does Producer work?
Producers send messages into topics. Different producers can send messages into same topics or different topics.
How does Consumers work?
Consumers poll/read periodically data from Kafka.
Many consumers can poll data from Kafka at the same time.
To allow for parallelism, consumers are organised in consumer groups that split up the work.
One key feature of Kafka is that Producers and Consumers are decoupled, what does that mean?
- They don’t know about the existence of each other.
- They don’t depend on each other.
- They need to agree on the data format of the records produced and consumed.
What does Kafka use ZooKeeper for?
- Cluster management
- Failure detection and recovery (eg. when a broker goes down)
- Store Access Control Lists (ACLs) used for authorization in the Kafka cluster
What are Segments?
Physical files.
The broker stores the messages as they come in memory (page cache), then periodically flushes them to a physical file.
Data can potentially be endless so the broker is using a “rolling-file” strategy. It creates/allocates a new file and fills it with messages. When the segments is full or expires, the next one is allocated by the broker.
Kafka rolls over files in segments to make it easier to manage data retention and remove old data.
What’s a log?
a data structure that is like a queue of elements. New elements are appended at the end of the log, and once written they are never changed,
Append only, write once data structure.
What’s a stream?
a sequence of events. The sequence has a beginning somewhere in the past.
Stream is immutable. What does it mean?
You can’t change a stream.
In stream processing one never modifies an existing stream, but always generates a new output stream.
What’s a record in Kafka?
A data element in a log or topic. But other words being used are message and event.
What are the Consumer Offset?
it keeps track of the latest message read and it’s stored in a special Kafka topic.
It can be changed to reread eg. messages
What is a consumer group?
A consumer group consists of 1 to many consumer instances. All consumer instances in a consumer group are identical clones of each other.
What’s the purpose of consumer group?
To allow to increase the throughput in downstream consumption of data flowing into a topic.