Week 8-9: Data Streams and Streaming Algorithms Flashcards
Data Stream
An ordered and potentially infinite sequence of elements.
Data streams have the following properties:
1. Potentially infinite (transient, single pass over the data, only summaries can be stored, real-time processing)
2. Non-static (incremental updates, concept drift, forgetting old data)
3. Temporal order may be important
Data streams can be modelled using server logs, a collection of small text files (great for online posts), and an ordered list of potentially infinite elements.
Element/Data Point
An element can include, but isn’t limited to:
1. Event
2. Figure
3. Record
4. Graph (in context of social network data streams)
Data Stream Example 1:
SELECT
System.Timestamp AS OutputTime, dspl AS SensorName,
Avg(temp) AS AvgTemperature
INTO
output
FROM
InputStream TIMESTAMP BY time
GROUP BY TumblingWindow(second,30),dspl
HAVING Avg(temp)>100
Monitor Avg(temp) every 30 seconds and display SensorName, Avg(temp) if AVg(temp) > 100 degrees.
Data Stream Management System (DSMS)
It’s software that handles continuous data stream.
Processor
Streams enter the DSMS via the Processor, and often have different velocity (arrival rate) and data types. Inputs streams need not have the same number of elements.
Archival Storage
It archives streams. Archival Storage isn’t for querying due to very low efficiency of data retrieval.
Limited Working Storage
It serves as storage space in the disk or in the main memory. It’s used for answering queries. It can’t store all data.
Standing Queries
They’re in the Processor and permanently execute and produce output at appropriate times.
Example: “output an alert ,when the variable TEMPERATURE exceeds 25)
Ad-hoc Queries
These are asked once about the current state of a stream or streams. The output of the DSMS, which comes from the Processor, is the answers to the query.
Example: “list hte total number of transactions in the last hour”
Queries: Sampling Data from a Stream
Build a random sample, such as a random subset of the stream.
Queries: Sliding Windows
Find the number of items of type x in the last k elements of the stream.
Queries: Filtering a Data Stream
Select elements with property x from the stream.
Queries: Counting Distinct Elements
Number of distinct elements in the last k elements of the stream.
Queries: Estimating Frequency Moments
Estimate the average/standard deviation of the last k elements.
Queries: Finding Frequent Elements
Return the elements that appear most frequently in the stream. Can specify the k-most elements to return.