Streaming Flashcards

1
Q

What techniques are required for stream processing?

A
  • A component that acquires events from producers and forwards to consumers
  • A component that processes events
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the Publish/subscribe model

A

It connects mulitple producers to multiple consumers. Message brokers are systems that sit between producers and consumers and deal with reliable message delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are two types of Broker based messaging?

A
  • Fire and forget: the broker acks the message immediately
  • Transaction based: the broker writes the message to permanent storage prior to acking it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the broker do?

A
  • Buffers the message using disk if appropriate
  • Routes the messages to the appropriate queues
  • Notifies consumers when messages have arrived
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do the consumers do?

A
  • Subscribe to the queue that contains their desired messages
  • Acks the message receipt
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the Competing workers messaging pattern

A

Multiple consumers read from a single queue competing for incoming messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the fan out messaging pattern

A

Each consumer has their own queue, which is replicated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the message routing messaging pattern

A

The producer assigns keys to message metadata and creates topic queues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some drawbacks of broker based messaging

A
  • Once the message is received it disappears
  • Reprocessing a message is impossible
  • Cannot prove a message was disappeared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is log based messaging?

A
  • producers append to a log
  • All consumers connect to the log and pull from it
  • A new client starts processing from the beginning of a log
  • The broker partitions the log to a cluster
  • The broker keeps track of current message offset for each consumer per partition
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some programming models for stream processing?

A
  • Event sourcing/command query segregation
  • Reactive programming
  • DataFlow model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe Event sourcing/Command Query Segregation

A

It captures all changes to an application state as a sequence of events:
* Instead of mutating the state we store the event that causes th emutation in an immutable log
* State is generated by processing the events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Reacrive Programming?

A

It is a declarative programming paradigm concerned with data streams and propagation of change
* Reactive APIs model event sources as infinite collections on which observers subscribe to and receive events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the four dimensions of the DataFlow model?

A
  • What: Reslts are being computed
  • Where: In event time they are being computer
  • When: In processing time they are materialized
  • How: Earlier results relate to later refinements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two notions of time in stream processing?

A
  • Processing time: The time at which events are observed in the system
  • Event time: The time at which events occurred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How are skew and lag calculated for a progessing time t?

A
  • Skew = t - s where s is the timestamp of the latest event processed
  • Lag = t - s where s is the actual timestamp of the event
17
Q

What are the 4 types of windows?

A
  • Tumbling (repeats non-overlapping window), range = slide
  • Jumping has an overlapping interval
  • Sliding
    > range > slide
    > Time sliding triggers at regular interval
    > Eviction sliding triggers on a count
  • Session windows -> aggregate batches of user activity and end after a session gap time
18
Q

What is a window trigger?

A

It defines when in processing time the results of a window are processed