College 8: DataStreams Flashcards

1
Q

Characteristics of datastreams

A
  • Incomplete Data: Not all data is available at once.
  • High-Speed: Data arrives very quickly.
  • Limited Storage: The system can only store a small part of the stream.
  • Huge Volume: Continuous, possibly infinite data.
  • One-Pass Algorithms: Can only scan data once.
  • Non-Stationary Distribution: Requires fast, real-time processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Handling data streams

A
  • Online learning
  • Sampling
  • Windowing functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Online learning

A

Continuously update model with new data.
- Small updates: make small updates as new data arrives.
- Initial training: use initial data to train the model.
- Concept drift: model adapts to new patterns over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling datastreams

A
  • Fixed proportion sampling: store a fixed ratio of the data
  • Random sampling: maintain a random sample of fixed size
  • Reservoir sampling: store first s elements, for each new element, replace an old one with a certain probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Windowing models

A
  • Sliding window: keep only most recent data items
  • Tumbling window: process data in non-overlapping batches
  • Hopping window: process overlapping data chunks
  • Exponentially decaying window: give more weight to recent data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Querying datastreams

A
  • Filtering: select elements with specific properties
  • Counting distinct elements: count unique elements in the stream
  • Estimating moments: calculate statistics like average and std
  • Finding frequent elements: identify popular items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Filtering datastreams

A
  • Bloom filters: efficiently test membership of elements in a set using a bit array. All set to 0 and using k independent hash functions
  • Counting distinct elements: use approximate methods like the Flajolet-Martin algorithm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly