Stream processing Flashcards

1
Q

some examples of streaming data (3)

A
    1. log files generated by customers using a mobile application
  1. social network activity
  2. e-commerce purchases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is stream processing

A

a processing mode where individual records or a small set of records are processed continuously, producing a simple response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

can streaming data be processed by batch processing?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is bounded data?

A

datasets that are finite in size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is unbounded data?

A

datasets that are (at least theoretically) infinite in size and new data can arrive and be made available at any point of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are streaming systems designed with in mind?

A

Unbounded data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a data surge?

A

a sudden and significant increase in the volume of data flowing through a streaming data processing system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For real-time systems, why is failing to produce a processing result within a time window as bad as not producing
a result at all?

A

The events may become “insignificant” and the insights or trends produced may no longer be valid or accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Examples of streaming data (4)

A
  1. Messages from social platforms (e.g. Twitter)
  2. Internet traffic going through a network device such as a switch
  3. Readings from an IoT device
  4. Interactions of users with a web application
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Frameworks for the ingestion of unbounded data (7)

A
  1. Apache Kafka
  2. Apache Flume
  3. Amazon Kinesis Firehose
  4. AWS IoT Events
  5. Azure Event Hub
  6. IoT Hub
  7. Google Pub/Sub
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are streams

A

sequences of immutable records that arrive at some point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Other phrases for streams (3)

A
  1. event streams
  2. event logs
  3. message queues
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of dataset are streams?

A

datasets in motion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of dataset are tables?

A

datasets at rest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the components of processing elements (PE)? (3)

A
  1. input queue
  2. computing element
  3. output queue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

why are input queues needed? (3)

A
  1. to maintain the order of the incoming events/data
  2. to signal systems to slow down when the rate at which data is arriving might exceed the processing capacity of the stream processing system
  3. to decouple the data source from the processing components allowing for more flexibility in management
17
Q

What is a spout in Apache Storm?

A

Elements that generate streams from external sources

18
Q

What is a stream in Apache Storm?

A

Unbounded stream of tuples

19
Q

What is a bolt in Apache Storm?

A

A processing element that consumes and generates streams

20
Q

What is a topology in Apache Storm?

A

a flow of spouts, streams and bolts

21
Q

what does the order of tuples in a stream represent?

A

the time at which they arrive at the streaming system

22
Q

what does the term at-least-once processing mean?

A

a guarantee that each message or event in a system will be processed at least once, but potentially more than once - ensuring that no data is lost in the event of failure

23
Q

what is event time?

A

the time at which one event has been generated by a source

24
Q

what is processing time?

A

the time at which events are seen by the stream processing system

25
Q

what are unordered streams?

A

streams where the event time ordering is different from the processing time ordering

26
Q

what is processing-time lag?

A

the delay between the generation and observation of an event in the system (this lag will change from tuple to tuple)

27
Q

when is event time or processing time relevant in processing? (2)

A
  1. To aggregate tuples and produce an aggregate computation (e.g., count, average, ..)
  2. To observe temporal patterns
28
Q

how do streaming systems to deal with temporal dimensions?

A

windowing

29
Q

what are the types of window?

A
  1. fixed
  2. session
  3. sliding
30
Q

what does stream processing by windowing require? (2)

A
  1. buffers to store tuples
  2. a trigger stream that triggers the computation
31
Q

how does Spark streaming work? (4)

A
  1. Receives input data streams
  2. Divide the data into micro-batches by temporal windowing
  3. Batches are treated as RDDs and processed by the Spark engine
  4. Results are streamed as batches
32
Q

What does Spark streaming use for stream processing?

A

Micro-batches

33
Q

a key difference between streaming and micro-batch processing?

A

micro-batching has higher latency (delays) - up to the time interval defining the micro-batch

34
Q

why is the inverse reduction function used when using a sliding window?

A

it prevents the code from recomputing the entire aggregation from scratch for each window update, you use the previous aggregate value and apply an inverse operation to adjust for the outgoing and incoming elements

35
Q

What is Kafka described as?

A

“the HDFS of unbounded data sources”