Kafka Extended API Flashcards

1
Q

What are Source Connectors used for?

A

To get data from Common Data Sources

Source Connectors are responsible for ingesting data into Kafka.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Sink Connectors used for?

A

To publish data in Common Data Stores

Sink Connectors send data from Kafka to external systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a task in Kafka Connect?

A

A task is linked to a connector configuration and executes tasks defined by the connector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Kafka Connect Worker?

A

A worker is a single Java process that executes tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Standalone Mode in Kafka Connect?

A

A single process runs connectors and tasks, easy for development but lacks fault tolerance and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Distributed Mode in Kafka Connect?

A

Multiple workers run connectors and tasks, easy to scale and fault tolerant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a stream in Kafka.

A

A sequence of immutable data records that is fully ordered, can be replayed, and is fault tolerant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a stream processor?

A

A node in the processor topology that transforms incoming streams, record by record, and may create a new stream from it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a topology in Kafka?

A

A graph of processors chained together by streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Source Processor?

A

A processor that takes its data directly from a Kafka Topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Sink Processor?

A

A processor that sends stream data directly to a Kafka topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What characterizes KStreams?

A

All inserts, similar to a log, and represent an infinite, unbounded data stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What characterizes KTables?

A

All upserts on non-null values, deletes on null values, and are similar to a database table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When should you use KStreams?

A

When reading from a topic that’s not compacted and new data is partial information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When should you use KTables?

A

When reading from a log compacted topic or needing a structure like a database table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define stateless transformation.

A

A transformation where the result only depends on the data-point being processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define stateful transformation.

A

A transformation where the result depends on external information or state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does MapValues do?

A

Affects only values, does not change keys, and doesn’t trigger a repartition.

19
Q

What does Map do?

A

Affects both keys and values and triggers a repartition.

20
Q

What does Filter do?

A

Produces zero or one record, doesn’t change keys/values, and doesn’t trigger a repartition.

21
Q

What is FilterNot?

A

The inverse of the filter operation.

22
Q

What does FlatMapValues do?

A

Produces zero, one, or more records without changing keys, and doesn’t trigger a repartition.

23
Q

What does FlatMap do?

A

Changes keys and triggers a repartition.

24
Q

What does KStream Branch do?

A

Branches a KStream based on one or more predicates, resulting in multiple KStreams.

25
Q

What does SelectKey do?

A

Assigns a new key to the record, changes the key, and marks data for re-partitioning.
For KStream

26
Q

What can you read from Kafka?

A

A topic as a KStream, KTable, or GlobalKTable.

27
Q

What can you write to Kafka?

A

Any KStream or KTable back to Kafka.

28
Q

What is log compaction?

A

An optimization that removes some messages while keeping the order of messages.

29
Q

What are the myths about log compaction?

A
  1. Does not prevent duplicate data.
  2. Does not prevent reading duplicates.
  3. Can fail occasionally.
30
Q

What is the significance of KStream and KTable duality?

A

A stream can be a changelog of a table, and a table can be a snapshot of the latest value for each key.

31
Q

How can you transform a KTable to a KStream?

A

In one line of code to keep a changelog of changes.

32
Q

What methods can be used to transform a KStream to a KTable?

A
  • Chain groupByKey() and aggregation step
  • Write back to Kafka and read as KTable
33
Q

What does KTable GroupBy do?

A

Allows more aggregations within a KTable and triggers a repartition.

34
Q

What is KGroupedStream?

A

Obtained after a groupBy/groupByKey() call on a KStream.

35
Q

What does the Count method do on KGroupedStream?

A

Counts the number of records by grouped key, ignoring null keys or values.

36
Q

What is the difference between Aggregate and Reduce?

A

Aggregate requires an initializer, adder, Serde, and State Store while Reduce must have the same input and output type.

37
Q

What does KStream Peek do?

A

Applies a side-effect operation to a KStream while returning the same KStream.

38
Q

What is Exactly Once Semantics?

A

Guarantees that data processing and pushing the message back to Kafka happen only once.

39
Q

What is At Least Once Semantics?

A

Messages may be received twice under certain conditions, such as broker reboots.

40
Q

What’s windowing?

A

windowing gives snapshot of an aggregate withing a given timeframe.

41
Q

what are the 4 types of windowing?

A

Tumbling: special type of the hopping window where the advanced by is equal to the window size. (don’t get duplicate or overlap like hopping)
Hopping: bound by time. Has a fixed start and end point and a fixed size.
Session: has window start and end, but they are not fixed like in tumbling and hopping. The window boundaries are determined by the events themselves.
Sliding: similar to hopping and tumbling cause fixed size. It’s driven by events and not time like the others.

42
Q

What do you need to use the Hopping windowing?

A

define the window size (duration of the window and what events will fall in this window) and the advanced size (determines how the window advances, eg. advance one minute)

43
Q

What is session windowing good for?

A

user browser sessions