streaming Flashcards
why we need streaming data process
1)latency: switching to streaming achieving a lower latency.
2)workload balancing, process data while they arrive, yielding more consistent and predictable consumption of resources.
what is streaming, and the key characteristics
a type of data process engine that is designed with infinite datasets in mind.
1)infinite data
2)infinite computation
3)low-latency result
data stream model
1)time series model (track the changes in an element’s state over time)
2)cash register model(track the increments)
3)turnstile model: record updates, both positive & negative
streaming style architecture components
1)data provider
2)collecting
3)message queuing
4)analysis
5)data access
6)data consumer
7)long-term storage
8)in-memory storage
what is message queuing, and whats the benefits?
it handles data exchange between components in a streaming architecture, primarily moving data from collection to analysis tier.
1)decouple the operations, simplify the design of the system and improve fault isolation.
2)load management: funnels multiple data streams to multiple consumers, configuring efficient distribution.
3)safe communication: provide reliability in data transfer.
what’s producer-broker-consumer model
producer generates data and sends to broker, broker manages queues organized by topics and partitions data for distribution.
consumer retrieves data from queues when ready.
what is durable queues
it means the message queue should ensure data is stored until its safely consumed, it supports offline and slow consumers.
what’s the role of analysis in message queueing?
and whats the key features
its the central of the architecture, it processes data streams in near real-time using specialized algorithms and models. on a per time or per window basis.
its a continuous query model
1)issued once and Continuously executed as new data arrive.
2)may require maintain a state
3)stateless queries, independent executions
4)stateful queries, maintain and update state for processing.
what’s windowing, and whats the different types of windowing?
group and process data in manageable chunks, defined by length & processing period.
1)sliding windows :fixed windows; overlapping windows, sampling windows.
2)data-driven windows: length determined by data patterns, its useful for user behavior analysis.
what is stream time and what is event time
event time is the actual time when the event occurred, as recorded by its source.
stream time is the time when the event enters the streaming system.
whats the difference between windowing by event time and windowing by stream time
windowing by stream time is more straightforward implementation, no need to handle the out-of-date data.
it always closed based on system defined timings. and it provides immediate insights.
but ignoring event time could cause inaccurate insights.
windowing by event time is like the golden standard of windowing.
but its impossible to precisely know when the window will be closed. and extend window lifetime means more buffering of data
and most of processing lacks of native support.