Apache Kafka Flashcards

Learn when to implement Kafka

1
Q

What is Apache Kafka?

A

A distributed streaming platform for building real-time data pipelines and streaming applications

It handles high throughput, fault tolerance, and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Producer in Kafka?

A

Any system or application that publishes messages to a Kafka topic

Producers send records to Kafka at high throughput.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Consumer in Kafka?

A

A system that reads messages from Kafka topics

Consumers can subscribe to multiple topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Broker in Kafka?

A

A Kafka server that stores data and serves clients (producers and consumers)

A Kafka cluster can consist of multiple brokers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Topic in Kafka?

A

A category or feed name to which messages are written by producers

Consumers subscribe to topics to read messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Partition in Kafka?

A

A unit of parallelism in Kafka, where topics are split into ordered, immutable sequences of messages

Each partition is distributed across Kafka brokers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Replication in Kafka?

A

The process of duplicating each partition across multiple brokers for data durability

Ensures fault tolerance in case of broker failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Consumer Group in Kafka?

A

A group of consumers that work together to consume messages from Kafka topics

Ensures that each partition is consumed by only one member of the group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What role does Zookeeper play in Kafka?

A

Used for distributed coordination, leader election, and managing broker metadata

Kafka is moving towards removing this dependency in future versions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Real-Time Stream Processing in Kafka?

A

Continuous streaming of real-time data processed on the fly by applications

Ideal for applications requiring up-to-the-second data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Kafka do in terms of decoupling systems?

A

Decouples different parts of an application, allowing producers and consumers to operate independently

This provides greater flexibility and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Durable Storage in Kafka?

A

Messages are stored in Kafka for a configurable amount of time or until a specified size is reached

This allows Kafka to serve as a persistent storage layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Kafka ensure Scalability and Fault Tolerance?

A

By replicating partitions and using multiple brokers to handle high-throughput data

If one broker fails, data remains accessible from another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Log Aggregation in Kafka?

A

The process of collecting logs from various services into a central stream for analysis

Useful for microservices architectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Event Sourcing in Kafka?

A

Capturing all changes to application state as a series of immutable events

Provides a reliable history of changes for easier debugging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does Kafka relate to Data Integration and ETL?

A

Serves as a central hub to move data between systems and can be part of ETL pipelines

Useful for integrating various data sources.

17
Q

What are common use cases for Kafka?

A
  • Real-Time Analytics
  • Log Aggregation
  • Event-Driven Architectures
  • Stream Processing
  • Data Integration

Applicable across various industries.

18
Q

What is Kafka Streams?

A

A client library for processing and analyzing data stored in Kafka

Enables real-time, scalable, and fault-tolerant stream processing applications.

19
Q

What is Kafka Connect?

A

A tool for connecting Kafka to external systems like databases and file systems

Provides pre-built connectors for many systems.

20
Q

What is KSQL?

A

An interactive SQL interface for stream processing in Kafka

Allows querying and manipulating streams using SQL-like syntax.

21
Q

What are the advantages of Kafka?

A
  • High Throughput
  • Scalable
  • Fault Tolerant
  • Low Latency
  • Durability

These features make Kafka suitable for high-volume applications.

22
Q

What are some challenges associated with Kafka?

A
  • Complexity
  • Zookeeper Dependency
  • Message Ordering

Managing Kafka at scale requires careful configuration.

23
Q

When is it best to use Kafka?

A
  • High throughput and scalability
  • Event-driven communication
  • Real-time data streaming
  • Durable storage of events

Particularly useful when these factors are required.