Kafka Flashcards
What is the Kafka?
Wikipedia defines Kafka as “an open-source message broker project developed by the Apache Software Foundation written in Scala and is a distributed publish-subscribe messaging system.
Explain the role of the offset?
Here is a sequential ID number given to the messages in the partitions that we call, an offset. So, to identify each message in the partition uniquely, we use these offsets.
Messages contained in the partitions are assigned a unique ID number that is called the offset. The role of the offset is to uniquely identify every message within the partition.
What is a Consumer Group?
Consumer Groups is a concept exclusive to Kafka. Every Kafka consumer group consists of one or more consumers that jointly consume a set of subscribed topics.
What is the role of the ZooKeeper?
Kafka uses Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group.
Is it possible to use Kafka without ZooKeeper?
No, it is not possible to bypass Zookeeper and connect directly to the Kafka server. If, for some reason, ZooKeeper is down, you cannot service any client request.
What is a partition in Kafka?
Partitions allow you to parallelize a topic by splitting the data into a particular topic across multiple brokers.
Explain the concept of Leader and Follower? (In Kafka)
Every partition in Kafka has one server which plays the role of a Leader, and none or more servers that act as Followers. The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader. In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
Each kafka partition has one server which acts as the leader
What is the process for starting a Kafka server?
Since Kafka uses ZooKeeper, it is essential to initialize the ZooKeeper server, and then fire up the Kafka server.
• To start the ZooKeeper server: > bin/zookeeper-server-start.sh config/zookeeper.properties
• Next, to start the Kafka server: > bin/kafka-server-start.sh config/server.properties
What is a topic in Kafka?
A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it. For each topic, the Kafka cluster maintains a partitioned log.
Apache Kafka Serialization and DeSerialization (SerDe)?
The process of converting an object into a stream of bytes for the purpose of the transmission is what we call Serialization. Although, Apache Kafka stores as well as transmit these bytes of arrays in its queue. Whereas, the opposite of Serialization is Deserialization. Here we convert bytes of arrays into the data type we desire.
How does Avro serialization work?
A language-independent schema is associated with its read and write operations. Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application. Avro uses the JSON format to declare the data structures. Apache Avro independent schema. The schema is usually written in JSON format and the serialization is usually to a binary file. Or Apache
What is Schema Evolution?
Schema evolution is the automatic transformation of Avro schema. This transformation is between the version of the schema that the client is using (its local copy), and what is currently contained in the store.
What is Kafka Streaming?
Kafka Streams can be used to build applications that process messages from Kafka topics (source), and reading data from a topic (source) , doing some analysis or transformation work, and then writing the results back to another topic (sink).
Explain Change Data Capture (CDC)?
Change data capture (CDC) is the process of capturing changes made at the data source and applying them throughout the enterprise. CDC minimizes the resources required for ETL ( extract, transform, load ) processes because it only deals with data changes. The goal of CDC is to ensure data synchronicity.
What is an Avro schema?
Avro is used to define the data schema for a record’s value. … The use of Avro schemas allows serialized values to be stored in a very space-efficient binary format. Each value is stored without any metadata other than a small internal schema identifier, between 1 and 4 bytes in size.