8. Streaming Data and Real-Time Processing Flashcards

1
Q

What are the key components of Apache Kafka?

A

The key components of Apache Kafka include producers, brokers, consumers, and topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Kafka achieve fault tolerance?

A

Kafka achieves fault tolerance through data replication across multiple brokers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between batch and streaming processing?

A

Batch processing handles large volumes of data at once, while streaming processing deals with data in real-time as it arrives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain window functions in streaming frameworks like Flink or Spark Streaming.

A

Window functions allow for grouping of data over a specified time frame for processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does Spark Streaming ensure exactly-once semantics?

A

Spark Streaming ensures exactly-once semantics through the use of write-ahead logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Kafka offset management, and why is it important?

A

Kafka offset management tracks the position of messages consumed, ensuring messages are not reprocessed or lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the concepts of watermarking and late arrival data.

A

Watermarking is a technique to handle late data by defining a threshold for event time, allowing for late arrivals to be processed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you handle out-of-order data in streaming pipelines?

A

Out-of-order data can be handled using techniques like buffering and watermarking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the differences between Apache Flink and Spark Streaming?

A

Apache Flink is designed for low-latency processing and supports event time, while Spark Streaming is micro-batch oriented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Kinesis compare with Kafka?

A

Kinesis is a fully managed service for real-time data processing, while Kafka is an open-source distributed streaming platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly