week 6 Flashcards
frequent itemsets
sets of items appearing in at least s(threshold) baskets
formulas
support(itemset) = frequency(itemset) / nr of baskets
confidence(itemset1 + itemset2) = support(itemset1 + itemset2) / support(itemset1)
lift (itemset1 + itemset2) = support(itemset1 + itemset2) / (support(itemset1) * support(itemset2))
apriori
1.store in main mem items with frequency > s (threshold) as frequent items
2.cartesian product => all candidate pairs of itemsets of frequent items of size 2
3.store in main mem only itemsets from step 2 with frequency > s(threshold)
PCY
same as apriori, but while doing step 1, also hash the pairs, and instead of step 2 and 3, store only pairs with frequent items whose bit in the bit vector is 1 (frequent bucket)
problem is collisions, overestimation like count-min sketch
multihash
like PCY, but use multiple independent hash tables, frequent itemset if both hash to frequent buckets (combats overestimation)