Kafka Extended API Flashcards by Hanne Tran

What are Source Connectors used for?

To get data from Common Data Sources

Source Connectors are responsible for ingesting data into Kafka.

How well did you know this?

Not at all

Perfectly

What are Sink Connectors used for?

To publish data in Common Data Stores

Sink Connectors send data from Kafka to external systems.

How well did you know this?

Not at all

Perfectly

What is a task in Kafka Connect?

A task is linked to a connector configuration and executes tasks defined by the connector.

How well did you know this?

Not at all

Perfectly

What is a Kafka Connect Worker?

A worker is a single Java process that executes tasks.

How well did you know this?

Not at all

Perfectly

What is Standalone Mode in Kafka Connect?

A single process runs connectors and tasks, easy for development but lacks fault tolerance and scalability.

How well did you know this?

Not at all

Perfectly

What is Distributed Mode in Kafka Connect?

Multiple workers run connectors and tasks, easy to scale and fault tolerant.

How well did you know this?

Not at all

Perfectly

Define a stream in Kafka.

A sequence of immutable data records that is fully ordered, can be replayed, and is fault tolerant.

How well did you know this?

Not at all

Perfectly

What is a stream processor?

A node in the processor topology that transforms incoming streams, record by record, and may create a new stream from it.

How well did you know this?

Not at all

Perfectly

What is a topology in Kafka?

A graph of processors chained together by streams.

How well did you know this?

Not at all

Perfectly

What is a Source Processor?

A processor that takes its data directly from a Kafka Topic.

How well did you know this?

Not at all

Perfectly

What is a Sink Processor?

A processor that sends stream data directly to a Kafka topic.

How well did you know this?

Not at all

Perfectly

What characterizes KStreams?

All inserts, similar to a log, and represent an infinite, unbounded data stream.

How well did you know this?

Not at all

Perfectly

What characterizes KTables?

All upserts on non-null values, deletes on null values, and are similar to a database table.

How well did you know this?

Not at all

Perfectly

When should you use KStreams?

When reading from a topic that’s not compacted and new data is partial information.

How well did you know this?

Not at all

Perfectly

When should you use KTables?

When reading from a log compacted topic or needing a structure like a database table.

How well did you know this?

Not at all

Perfectly

Define stateless transformation.

A transformation where the result only depends on the data-point being processed.

How well did you know this?

Not at all

Perfectly

Define stateful transformation.

A transformation where the result depends on external information or state.

How well did you know this?

Not at all

Perfectly

What does MapValues do?

Study These Flashcards

Affects only values, does not change keys, and doesn’t trigger a repartition.

What does Map do?

Study These Flashcards

Affects both keys and values and triggers a repartition.

What does Filter do?

Study These Flashcards

Produces zero or one record, doesn’t change keys/values, and doesn’t trigger a repartition.

What is FilterNot?

Study These Flashcards

The inverse of the filter operation.

What does FlatMapValues do?

Study These Flashcards

Produces zero, one, or more records without changing keys, and doesn’t trigger a repartition.

What does FlatMap do?

Study These Flashcards

Changes keys and triggers a repartition.

What does KStream Branch do?

Study These Flashcards

Branches a KStream based on one or more predicates, resulting in multiple KStreams.

What does SelectKey do?

Assigns a new key to the record, changes the key, and marks data for re-partitioning. For KStream

What can you read from Kafka?

A topic as a KStream, KTable, or GlobalKTable.

What can you write to Kafka?

Any KStream or KTable back to Kafka.

What is log compaction?

An optimization that removes some messages while keeping the order of messages.

What are the myths about log compaction?

1. Does not prevent duplicate data. 2. Does not prevent reading duplicates. 3. Can fail occasionally.

What is the significance of KStream and KTable duality?

A stream can be a changelog of a table, and a table can be a snapshot of the latest value for each key.

How can you transform a KTable to a KStream?

In one line of code to keep a changelog of changes.

What methods can be used to transform a KStream to a KTable?

* Chain groupByKey() and aggregation step * Write back to Kafka and read as KTable

What does KTable GroupBy do?

Allows more aggregations within a KTable and triggers a repartition.

What is KGroupedStream?

Obtained after a groupBy/groupByKey() call on a KStream.

What does the Count method do on KGroupedStream?

Counts the number of records by grouped key, ignoring null keys or values.

What is the difference between Aggregate and Reduce?

Aggregate requires an initializer, adder, Serde, and State Store while Reduce must have the same input and output type.

What does KStream Peek do?

Applies a side-effect operation to a KStream while returning the same KStream.

What is Exactly Once Semantics?

Guarantees that data processing and pushing the message back to Kafka happen only once.

What is At Least Once Semantics?

Messages may be received twice under certain conditions, such as broker reboots.

What's windowing?

windowing gives snapshot of an aggregate withing a given timeframe.

what are the 4 types of windowing?

Tumbling: special type of the hopping window where the advanced by is equal to the window size. (don't get duplicate or overlap like hopping) Hopping: bound by time. Has a fixed start and end point and a fixed size. Session: has window start and end, but they are not fixed like in tumbling and hopping. The window boundaries are determined by the events themselves. Sliding: similar to hopping and tumbling cause fixed size. It's driven by events and not time like the others.

What do you need to use the Hopping windowing?

define the window size (duration of the window and what events will fall in this window) and the advanced size (determines how the window advances, eg. advance one minute)

What is session windowing good for?

user browser sessions

Kafka Extended API Flashcards

(43 cards)