week 4 Flashcards
process models
AND/OR split/join
conformance checking - trace-to-model alignment (remove some processes to optimize flow)
data streaming
sampling - goal - retain properties of the whole data set
reservoir sampling - choose to sample i’th item with probability m/i => you get uniform distribution
order sampling - store k items with smallest random tags
frequency algorithms
majority - counter = 0, reassign element, else increase/decrease counter, keep element, final element is majority
frequent - find all elements occurring sufficiently frequent, same thing as majority but for k elements, decrease all counters when encountering different element
sketching
counting nr of distinct items (hard to do with sampling if these distinct items don’t occur enough times)
esentially, a linear transformation of input (using hash functions)
sketch = vector * implicit matrix
bloom filter(sketch)
k hash functions, add element => all positions return by hash functions are flipped, element was present if all positions corresponding to hash functions are flipped => false positives
count-min sketch
dxm matrix, d = nr of hash functions, m = nr of vectors
new element => increment 1 position on every row corresponding to value of hash function for particular row
final count = min(corresponding positions over all rows)
final counts are overestimates (collisions), fixed by independence across rows
FM sketch
the larger the number of distinct elements, the more likely an unusual hash value appears
LSH
objective - maximize collisions of neighbours
collisions - resolve by AND between multiple projections
splits - resolve by OR between multiple projections
cascade AND and OR combinations for optimal performance