Streaming Flashcards
What techniques are required for stream processing?
- A component that acquires events from producers and forwards to consumers
- A component that processes events
Describe the Publish/subscribe model
It connects mulitple producers to multiple consumers. Message brokers are systems that sit between producers and consumers and deal with reliable message delivery.
What are two types of Broker based messaging?
- Fire and forget: the broker acks the message immediately
- Transaction based: the broker writes the message to permanent storage prior to acking it
What does the broker do?
- Buffers the message using disk if appropriate
- Routes the messages to the appropriate queues
- Notifies consumers when messages have arrived
What do the consumers do?
- Subscribe to the queue that contains their desired messages
- Acks the message receipt
Describe the Competing workers messaging pattern
Multiple consumers read from a single queue competing for incoming messages.
Describe the fan out messaging pattern
Each consumer has their own queue, which is replicated
Describe the message routing messaging pattern
The producer assigns keys to message metadata and creates topic queues.
What are some drawbacks of broker based messaging
- Once the message is received it disappears
- Reprocessing a message is impossible
- Cannot prove a message was disappeared
What is log based messaging?
- producers append to a log
- All consumers connect to the log and pull from it
- A new client starts processing from the beginning of a log
- The broker partitions the log to a cluster
- The broker keeps track of current message offset for each consumer per partition
What are some programming models for stream processing?
- Event sourcing/command query segregation
- Reactive programming
- DataFlow model
Describe Event sourcing/Command Query Segregation
It captures all changes to an application state as a sequence of events:
* Instead of mutating the state we store the event that causes th emutation in an immutable log
* State is generated by processing the events
What is Reacrive Programming?
It is a declarative programming paradigm concerned with data streams and propagation of change
* Reactive APIs model event sources as infinite collections on which observers subscribe to and receive events
What are the four dimensions of the DataFlow model?
- What: Reslts are being computed
- Where: In event time they are being computer
- When: In processing time they are materialized
- How: Earlier results relate to later refinements
What are the two notions of time in stream processing?
- Processing time: The time at which events are observed in the system
- Event time: The time at which events occurred