Stream Data Processing Flashcards
Why can’t event stream data not be stored as big data?
It would result in one small file per event –> too many files
Event hub
A buffer that buffers the events and stores them in batches.
The different kinds of stream processing
1 - strema data integration
2 - stream analytics
Stream data integration
Focuses on ingestion and processing of the data sources targeting ETL
Stream analytics
Targets analytics use cases. Calculates aggregates and detects patterns.
Native streaming
Events are processed as they arrive -> lowest latency but high fault tolerance
Window
A certain amount of data to perform computations on
Three types of windows
1 - fixed/tumbling windows
2 - sliding/hopping windows
3 - session windows
Fixed/tumbling windows
Stops if the window is full, based on the count of items or the time
Sliding/hopping windows
Stops based on window + sliding interval length
Session windows
Sequences of temporarily related events terminated by a gap of inactivity
Which two kinds of queries are there?
1 - ad-hoc queries
2 - standing queries
Standing queries
Queries that are stored and permanently executed
Ad-hoc queries
One time questions
Main differences between batch processing and stream processing?
1 - The input is not controlled by the system
2 - The input timing/rate is often unknown