Apache Kafka Flashcards
Learn when to implement Kafka
What is Apache Kafka?
A distributed streaming platform for building real-time data pipelines and streaming applications
It handles high throughput, fault tolerance, and scalability.
What is a Producer in Kafka?
Any system or application that publishes messages to a Kafka topic
Producers send records to Kafka at high throughput.
What is a Consumer in Kafka?
A system that reads messages from Kafka topics
Consumers can subscribe to multiple topics.
What is a Broker in Kafka?
A Kafka server that stores data and serves clients (producers and consumers)
A Kafka cluster can consist of multiple brokers.
What is a Topic in Kafka?
A category or feed name to which messages are written by producers
Consumers subscribe to topics to read messages.
What is a Partition in Kafka?
A unit of parallelism in Kafka, where topics are split into ordered, immutable sequences of messages
Each partition is distributed across Kafka brokers.
What is Replication in Kafka?
The process of duplicating each partition across multiple brokers for data durability
Ensures fault tolerance in case of broker failure.
What is a Consumer Group in Kafka?
A group of consumers that work together to consume messages from Kafka topics
Ensures that each partition is consumed by only one member of the group.
What role does Zookeeper play in Kafka?
Used for distributed coordination, leader election, and managing broker metadata
Kafka is moving towards removing this dependency in future versions.
What is Real-Time Stream Processing in Kafka?
Continuous streaming of real-time data processed on the fly by applications
Ideal for applications requiring up-to-the-second data.
What does Kafka do in terms of decoupling systems?
Decouples different parts of an application, allowing producers and consumers to operate independently
This provides greater flexibility and scalability.
What is Durable Storage in Kafka?
Messages are stored in Kafka for a configurable amount of time or until a specified size is reached
This allows Kafka to serve as a persistent storage layer.
How does Kafka ensure Scalability and Fault Tolerance?
By replicating partitions and using multiple brokers to handle high-throughput data
If one broker fails, data remains accessible from another.
What is Log Aggregation in Kafka?
The process of collecting logs from various services into a central stream for analysis
Useful for microservices architectures.
What is Event Sourcing in Kafka?
Capturing all changes to application state as a series of immutable events
Provides a reliable history of changes for easier debugging.
How does Kafka relate to Data Integration and ETL?
Serves as a central hub to move data between systems and can be part of ETL pipelines
Useful for integrating various data sources.
What are common use cases for Kafka?
- Real-Time Analytics
- Log Aggregation
- Event-Driven Architectures
- Stream Processing
- Data Integration
Applicable across various industries.
What is Kafka Streams?
A client library for processing and analyzing data stored in Kafka
Enables real-time, scalable, and fault-tolerant stream processing applications.
What is Kafka Connect?
A tool for connecting Kafka to external systems like databases and file systems
Provides pre-built connectors for many systems.
What is KSQL?
An interactive SQL interface for stream processing in Kafka
Allows querying and manipulating streams using SQL-like syntax.
What are the advantages of Kafka?
- High Throughput
- Scalable
- Fault Tolerant
- Low Latency
- Durability
These features make Kafka suitable for high-volume applications.
What are some challenges associated with Kafka?
- Complexity
- Zookeeper Dependency
- Message Ordering
Managing Kafka at scale requires careful configuration.
When is it best to use Kafka?
- High throughput and scalability
- Event-driven communication
- Real-time data streaming
- Durable storage of events
Particularly useful when these factors are required.