Apache Kafka Flashcards

Question 1

Q

What is Apache Kafka?

Answer

A

A distributed streaming platform for building real-time data pipelines and streaming applications

It handles high throughput, fault tolerance, and scalability.

Question 2

Q

What is a Producer in Kafka?

Answer

A

Any system or application that publishes messages to a Kafka topic

Producers send records to Kafka at high throughput.

Question 3

Q

What is a Consumer in Kafka?

Answer

A

A system that reads messages from Kafka topics

Consumers can subscribe to multiple topics.

Question 4

Q

What is a Broker in Kafka?

Answer

A

A Kafka server that stores data and serves clients (producers and consumers)

A Kafka cluster can consist of multiple brokers.

Question 5

Q

What is a Topic in Kafka?

Answer

A

A category or feed name to which messages are written by producers

Consumers subscribe to topics to read messages.

Question 6

Q

What is a Partition in Kafka?

Answer

A

A unit of parallelism in Kafka, where topics are split into ordered, immutable sequences of messages

Each partition is distributed across Kafka brokers.

Question 7

Q

What is Replication in Kafka?

Answer

A

The process of duplicating each partition across multiple brokers for data durability

Ensures fault tolerance in case of broker failure.

Question 8

Q

What is a Consumer Group in Kafka?

Answer

A

A group of consumers that work together to consume messages from Kafka topics

Ensures that each partition is consumed by only one member of the group.

Question 9

Q

What role does Zookeeper play in Kafka?

Answer

A

Used for distributed coordination, leader election, and managing broker metadata

Kafka is moving towards removing this dependency in future versions.

Question 10

Q

What is Real-Time Stream Processing in Kafka?

Answer

A

Continuous streaming of real-time data processed on the fly by applications

Ideal for applications requiring up-to-the-second data.

Question 11

Q

What does Kafka do in terms of decoupling systems?

Answer

A

Decouples different parts of an application, allowing producers and consumers to operate independently

This provides greater flexibility and scalability.

Question 12

Q

What is Durable Storage in Kafka?

Answer

A

Messages are stored in Kafka for a configurable amount of time or until a specified size is reached

This allows Kafka to serve as a persistent storage layer.

Question 13

Q

How does Kafka ensure Scalability and Fault Tolerance?

Answer

A

By replicating partitions and using multiple brokers to handle high-throughput data

If one broker fails, data remains accessible from another.

Question 14

Q

What is Log Aggregation in Kafka?

Answer

A

The process of collecting logs from various services into a central stream for analysis

Useful for microservices architectures.

Question 15

Q

What is Event Sourcing in Kafka?

Answer

A

Capturing all changes to application state as a series of immutable events

Provides a reliable history of changes for easier debugging.

Question 16

Q

How does Kafka relate to Data Integration and ETL?

Answer

Study These Flashcards

A

Serves as a central hub to move data between systems and can be part of ETL pipelines

Useful for integrating various data sources.

Question 17

Q

What are common use cases for Kafka?

Answer

Study These Flashcards

A

Real-Time Analytics
Log Aggregation
Event-Driven Architectures
Stream Processing
Data Integration

Applicable across various industries.

Question 18

Q

What is Kafka Streams?

Answer

Study These Flashcards

A

A client library for processing and analyzing data stored in Kafka

Enables real-time, scalable, and fault-tolerant stream processing applications.

Question 19

Q

What is Kafka Connect?

Answer

Study These Flashcards

A

A tool for connecting Kafka to external systems like databases and file systems

Provides pre-built connectors for many systems.

Question 20

Q

What is KSQL?

Answer

Study These Flashcards

A

An interactive SQL interface for stream processing in Kafka

Allows querying and manipulating streams using SQL-like syntax.

Question 21

Q

What are the advantages of Kafka?

Answer

Study These Flashcards

A

High Throughput
Scalable
Fault Tolerant
Low Latency
Durability

These features make Kafka suitable for high-volume applications.

Question 22

Q

What are some challenges associated with Kafka?

Answer

Study These Flashcards

A

Complexity
Zookeeper Dependency
Message Ordering

Managing Kafka at scale requires careful configuration.

Question 23

Q

When is it best to use Kafka?

Answer

Study These Flashcards

A

High throughput and scalability
Event-driven communication
Real-time data streaming
Durable storage of events

Particularly useful when these factors are required.

Apache Kafka Flashcards

Learn when to implement Kafka (23 cards)