Module 11 - Apache Kafka Flashcards
What is Apache Kafka?
Kafka is a stream processing & event-processing platform designed to handle heavy updates in a distributed system
What are the 3 main features of Apache Kafka?
- Publish-Subscribe (message-oriented communication)
- Real-time stream processing
- Distributed and replicated storage of messages and streams
Kafka supports many different _______ for interacting with other systems.
APIs
The producer and consumer API is useful for what functionality?
Providing asynchronous communication between applications
What are some uses for Apache Kafka?
Any of:
- High-throughput messaging
- Website activity tracking
- Metric collection
- Low-latency log aggregation
- Stream processing
- Event sourcing and commit logging
The Processor API can be used to implement both _____ as well as ______ operations
stateless
stateful
With the Apache Kafka processor API, ______ operations are achieved through the use of _____ stores
stateful
state
What is a Topic in Kafka?
A topic is a stream of records, or a log of events in the message queue.
What exactly is a “record”
A collection of data items arranged for processing by a program
In Kafka, you create different _____ to hold different kinds of _____. Different ______ hold filtered and transformed _____ of the same kind of event.
topics
events
topics
versions
A _____ is a stream of records. It is stored as a partitioned _____. The _____ period for ______ records is ______
topic log retention published configurable
Why do we partition logs into different logs?
Improve scalability, throughput, and storage capacity
Kafka stores for each consumer/reader the ______ of the next record to be read. This is represented as an ______ in the log.
position
offset
Kafka stores state for the position of the next ______ to be read/consumed by the reader/consumer
record
What is the benefit of having the Kafka system store the state of the next record to be read instead of the reader?
The state does not need to be maintained by the reader. There is improved fault tolerance since the Kafka system stores the order and position of each reader/consumer
What are 3 characteristics of producers in Kafka?
- Pushes records to Kafka brokers, and chooses which partition to contact for a given topic
- Can batch records and send them to broker asynchronously
- Can perform idempotent delivery (avoids duplicate commits)
What is the benefit of the producer being able to batch records and sending them to broker asynchronously?
Much better throughput, and latency penalty is negligible
In Kafka, producers choose which _____ to contact for a given ______
partition
topic