week 4 Flashcards

1
Q

process models

A

AND/OR split/join
conformance checking - trace-to-model alignment (remove some processes to optimize flow)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

data streaming

A

sampling - goal - retain properties of the whole data set
reservoir sampling - choose to sample i’th item with probability m/i => you get uniform distribution
order sampling - store k items with smallest random tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

frequency algorithms

A

majority - counter = 0, reassign element, else increase/decrease counter, keep element, final element is majority
frequent - find all elements occurring sufficiently frequent, same thing as majority but for k elements, decrease all counters when encountering different element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sketching

A

counting nr of distinct items (hard to do with sampling if these distinct items don’t occur enough times)
esentially, a linear transformation of input (using hash functions)
sketch = vector * implicit matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

bloom filter(sketch)

A

k hash functions, add element => all positions return by hash functions are flipped, element was present if all positions corresponding to hash functions are flipped => false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

count-min sketch

A

dxm matrix, d = nr of hash functions, m = nr of vectors
new element => increment 1 position on every row corresponding to value of hash function for particular row
final count = min(corresponding positions over all rows)
final counts are overestimates (collisions), fixed by independence across rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FM sketch

A

the larger the number of distinct elements, the more likely an unusual hash value appears

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

LSH

A

objective - maximize collisions of neighbours
collisions - resolve by AND between multiple projections
splits - resolve by OR between multiple projections
cascade AND and OR combinations for optimal performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly