week 4 Flashcards

Question 1

Q

process models

Answer

A

AND/OR split/join
conformance checking - trace-to-model alignment (remove some processes to optimize flow)

Question 2

Q

data streaming

Answer

A

sampling - goal - retain properties of the whole data set
reservoir sampling - choose to sample i’th item with probability m/i => you get uniform distribution
order sampling - store k items with smallest random tags

Question 3

Q

frequency algorithms

Answer

A

majority - counter = 0, reassign element, else increase/decrease counter, keep element, final element is majority
frequent - find all elements occurring sufficiently frequent, same thing as majority but for k elements, decrease all counters when encountering different element

Question 4

Q

sketching

Answer

A

counting nr of distinct items (hard to do with sampling if these distinct items don’t occur enough times)
esentially, a linear transformation of input (using hash functions)
sketch = vector * implicit matrix

Question 5

Q

bloom filter(sketch)

Answer

A

k hash functions, add element => all positions return by hash functions are flipped, element was present if all positions corresponding to hash functions are flipped => false positives

Question 6

Q

count-min sketch

Answer

A

dxm matrix, d = nr of hash functions, m = nr of vectors
new element => increment 1 position on every row corresponding to value of hash function for particular row
final count = min(corresponding positions over all rows)
final counts are overestimates (collisions), fixed by independence across rows

Question 7

Q

FM sketch

Answer

A

the larger the number of distinct elements, the more likely an unusual hash value appears

Question 8

Q

LSH

Answer

A

objective - maximize collisions of neighbours
collisions - resolve by AND between multiple projections
splits - resolve by OR between multiple projections
cascade AND and OR combinations for optimal performance

week 4 Flashcards

(8 cards)