Module 11 - Apache Kafka Flashcards

1
Q

What is Apache Kafka?

A

Kafka is a stream processing & event-processing platform designed to handle heavy updates in a distributed system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 main features of Apache Kafka?

A
  • Publish-Subscribe (message-oriented communication)
  • Real-time stream processing
  • Distributed and replicated storage of messages and streams
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kafka supports many different _______ for interacting with other systems.

A

APIs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The producer and consumer API is useful for what functionality?

A

Providing asynchronous communication between applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some uses for Apache Kafka?

A

Any of:

  • High-throughput messaging
  • Website activity tracking
  • Metric collection
  • Low-latency log aggregation
  • Stream processing
  • Event sourcing and commit logging
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Processor API can be used to implement both _____ as well as ______ operations

A

stateless

stateful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

With the Apache Kafka processor API, ______ operations are achieved through the use of _____ stores

A

stateful

state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Topic in Kafka?

A

A topic is a stream of records, or a log of events in the message queue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What exactly is a “record”

A

A collection of data items arranged for processing by a program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In Kafka, you create different _____ to hold different kinds of _____. Different ______ hold filtered and transformed _____ of the same kind of event.

A

topics
events
topics
versions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A _____ is a stream of records. It is stored as a partitioned _____. The _____ period for ______ records is ______

A
topic
log
retention
published
configurable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why do we partition logs into different logs?

A

Improve scalability, throughput, and storage capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kafka stores for each consumer/reader the ______ of the next record to be read. This is represented as an ______ in the log.

A

position

offset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kafka stores state for the position of the next ______ to be read/consumed by the reader/consumer

A

record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the benefit of having the Kafka system store the state of the next record to be read instead of the reader?

A

The state does not need to be maintained by the reader. There is improved fault tolerance since the Kafka system stores the order and position of each reader/consumer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are 3 characteristics of producers in Kafka?

A
  1. Pushes records to Kafka brokers, and chooses which partition to contact for a given topic
  2. Can batch records and send them to broker asynchronously
  3. Can perform idempotent delivery (avoids duplicate commits)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the benefit of the producer being able to batch records and sending them to broker asynchronously?

A

Much better throughput, and latency penalty is negligible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In Kafka, producers choose which _____ to contact for a given ______

A

partition

topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are 2 characteristics of consumers in Kafka?

A
  1. Pulls records in batches from a Kafka broker, who advances the consumer’s offset in the topic
  2. Able to achieve “exactly once” semantics when a client consumes from one topic and produces to another
20
Q

A consumer in Kafka pulls ______ in ______ from a broker. The broker advances that consumer’s _____ within the topic.

A

records
batches
offset

21
Q

When a client consumes from one topic and produces to another, Kafka achieves _____ _____ semantics

A

exactly once

22
Q

A Kafka broker is also known as a Kafka ______

A

server

23
Q

The producer can push records _____ at a time, or it can _____ them

A

one

batch

24
Q

What does “idempotent delivery” mean in the context of Kafka?

How is it ensured in Kafka?

A

If a producer accidentally pushes the same message twice, then we can avoid duplicate commits

It is ensured by tagging commits with some sort of unique identifier

25
Q

Producers will produce _____ but consumers can consume in _____

A

individually

groups

26
Q

Kafka internally uses a ____ ____ to store the state of a _____ operator

A

state store

stateful

27
Q

What are console producer and console consumers?

A

Command line tools which are provided with Kafka

28
Q

In Kafka, there are two ways to interpret semantics of a stream. What are they? and what are their properties?

A

Record Stream: Each record represents a state transition (ex: the balance of an account number ABC is increased by XYZ)

Change-log stream: Each record represents a state (ex: account number ABC has a balance of XYZ)

29
Q

What is the representation for Record Streams and Change-log streams in the Kafka Streams API?

A

KStream for record streams

KTable for change-log streams

30
Q

The _____ of streams and tables refers to the fact that change-log streams and tables are logically _______

A

duality

interchangeable

31
Q

Explain how change-log streams and tables are logically interchangeable.

Show how they can be represented as tables

A
  • Each record in a change-log stream defines one row of the table, and overwrites any prior row for the same key
  • A table can be viewed as a snapshot of the latest value for each key in a change-log stream

This is why a change-log stream in Kafka is represented using a KTable

32
Q

What is a KGroupedStream object in Kafka?

A

KGroupedStream is an abstraction of a grouped record stream of KeyValue pairs. It is an intermediate representation of a KStream in order to apply an aggregation operation on the original KStream records.

33
Q

What are some transformations for converting from a KStream to a KGroupedStream object in Kafka?

A
  • groupBy

- groupByKey

34
Q

What are some transformations for converting from a KGroupedStream object to a KTable object in Kafka?

A
  • count
  • reduce
  • aggregate
35
Q

What is the transformation for converting from a KTable object to a KStream object in Kafka?

A
  • toStream
36
Q

What is a KStream in Kafka?

A

An abstraction of a record stream of KeyValue pairs which represent events.

37
Q

What is a KTable in Kafka?

A

An abstraction of change-log stream from a primary-keyed table. Each entry represents a state.

38
Q

What does windowing allow for in Kafka?

A

Windowing allows for control on how to group records that have the same key for stateful operations such as aggregations or joins into “windows”.

39
Q

What are “hopping time windows” in Kafka?

A

Windows are defined by

  • size
  • advance interval (also known as “hop”)

For example every 10 seconds (the hop), compute some transformation over the last 60 seconds (the size).

Hopping windows may be overlapping, or they may have gaps in between them

40
Q

What are “Tumbling time windows” in Kafka?

A

Special case of hopping windows where the window size = the advance interval. There are no overlaps and it is gapless.

Each subsequent window is unique in elements

41
Q

What are “Sliding windows” in Kafka?

A

Windows that slide continuously over the time axis, used only for joins

42
Q

What are “Session windows” in Kafka?

A

Windows that aggregate data by period of activity. A new session is created when the period of inactivity exceeds a given threshold

43
Q

Kafka is a distributed ____ streaming platform that lets you read, write, store, and process _____

A

event

events

44
Q

Kafka Streams Transformations are available in two types: ______ and ______

A

Stateless

Stateful

45
Q

KTable behaviour: If a key in the stream exists, it will be ______. If the key does not exist it will be ______.

A

updated

inserted

46
Q

________ is an abstraction of a grouped record stream of KeyValue pairs. It is an intermediate representation of a _____ in order to apply an ______ operation on the original KStream records.

A

KGroupedStream
KStream
aggregation