Kafka Flashcards

Question 1

Q

What is the Kafka?

Answer

A

Wikipedia defines Kafka as “an open-source message broker project developed by the Apache Software Foundation written in Scala and is a distributed publish-subscribe messaging system.

Question 2

Q

Explain the role of the offset?

Answer

A

Here is a sequential ID number given to the messages in the partitions that we call, an offset. So, to identify each message in the partition uniquely, we use these offsets.
Messages contained in the partitions are assigned a unique ID number that is called the offset. The role of the offset is to uniquely identify every message within the partition.

Question 3

Q

What is a Consumer Group?

Answer

A

Consumer Groups is a concept exclusive to Kafka. Every Kafka consumer group consists of one or more consumers that jointly consume a set of subscribed topics.

Question 4

Q

What is the role of the ZooKeeper?

Answer

A

Kafka uses Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group.

Question 5

Q

Is it possible to use Kafka without ZooKeeper?

Answer

A

No, it is not possible to bypass Zookeeper and connect directly to the Kafka server. If, for some reason, ZooKeeper is down, you cannot service any client request.

Question 6

Q

What is a partition in Kafka?

Answer

A

Partitions allow you to parallelize a topic by splitting the data into a particular topic across multiple brokers.

Question 7

Q

Explain the concept of Leader and Follower? (In Kafka)

Answer

A

Every partition in Kafka has one server which plays the role of a Leader, and none or more servers that act as Followers. The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader. In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
Each kafka partition has one server which acts as the leader

Question 8

Q

What is the process for starting a Kafka server?

Answer

A

Since Kafka uses ZooKeeper, it is essential to initialize the ZooKeeper server, and then fire up the Kafka server.
• To start the ZooKeeper server: > bin/zookeeper-server-start.sh config/zookeeper.properties
• Next, to start the Kafka server: > bin/kafka-server-start.sh config/server.properties

Question 9

Q

What is a topic in Kafka?

Answer

A

A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it. For each topic, the Kafka cluster maintains a partitioned log.

Question 10

Q

Apache Kafka Serialization and DeSerialization (SerDe)?

Answer

A

The process of converting an object into a stream of bytes for the purpose of the transmission is what we call Serialization. Although, Apache Kafka stores as well as transmit these bytes of arrays in its queue. Whereas, the opposite of Serialization is Deserialization. Here we convert bytes of arrays into the data type we desire.

Question 11

Q

How does Avro serialization work?

Answer

A

A language-independent schema is associated with its read and write operations. Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application. Avro uses the JSON format to declare the data structures. Apache Avro independent schema. The schema is usually written in JSON format and the serialization is usually to a binary file. Or Apache

Question 12

Q

What is Schema Evolution?

Answer

A

Schema evolution is the automatic transformation of Avro schema. This transformation is between the version of the schema that the client is using (its local copy), and what is currently contained in the store.

Question 13

Q

What is Kafka Streaming?

Answer

A

Kafka Streams can be used to build applications that process messages from Kafka topics (source), and reading data from a topic (source) , doing some analysis or transformation work, and then writing the results back to another topic (sink).

Question 14

Q

Explain Change Data Capture (CDC)?

Answer

A

Change data capture (CDC) is the process of capturing changes made at the data source and applying them throughout the enterprise. CDC minimizes the resources required for ETL ( extract, transform, load ) processes because it only deals with data changes. The goal of CDC is to ensure data synchronicity.

Question 15

Q

What is an Avro schema?

Answer

A

Avro is used to define the data schema for a record’s value. … The use of Avro schemas allows serialized values to be stored in a very space-efficient binary format. Each value is stored without any metadata other than a small internal schema identifier, between 1 and 4 bytes in size.

Question 16

Q

What Is The Maximum Size Of The Message Does Kafka Server Can Receive?

Answer

A

The maximum size of the message that the Kafka server can receive is 1000000 bytes.

Question 17

Q

What is the core API in Kafka?

Answer

A

They are 4 main core API’s:

Producer API
Consumer API
Streams API
Connector API

Question 18

Q

Explain the functionality of Streams API in Kafka?

Answer

A

The Streams API is responsible where it allows the application to act as a processor and within the process, it will be effectively transforming the input streams to output streams.

Question 19

Q

What is Avro?

Answer

A

Avro provides a compact serialization format schemas that are separate from the message pay‐ loads and that do not require code to be generated when they change; and strong data typing and schema evolution, with both backward and forward compatibility

Question 20

Q

What are the Schema Registry and the advantages of the Schema Registry?

Answer

A

Using Schema Registry, all schemas are registered with a central system.