Association FINAL Flashcards
Association Rules interested in
Observing which objects occur together
Association rules recommending or co-occur?
Seeing which items co-occur
Association Rule Mining
Given a set of transactions, find the rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Does implications mean casuality?
No, means co-occurrence
{} -> {}
Antecedent -> Consequent
3 types of database
Binary, Transaction, Vertical
Items
I = {x1, x2, …, xm}
A set X within the set of items
Itemset
An itemset of cardinality k
k-itemset
I^(k)
set of all k-itemsets
Transaction identifiers, tids
T = {t1, t2, …, tn}
t within T
tidset
Transaction
Tuple in the form (t, X) where t is a unique transaction identifier and X is an itemset
Support
The support of an itemset X in a dataset D denoted sup(X, D) is the number of transactions in D that contain X
Relative Support
The relative support of X is the fraction of transactions that contain X: sup(X,D)|D|
We use F to
denote the set of all itemsets
We use F^(k)
to denote the set of k-itemsets
Itemset mining problem
Given a minimum support threshold (minsup), find all itemsets X s.t. sup(x) >= minsup
Frequent itemsets
An itemset X is frequent if sup(x) >= minsup where minsup is a user specified minimum support threshold (if minsup is fraction, then relative support is implied)
Total possible subset
2^|I|
Naive approach to generate all itemsets that are frequent
For all x in I:
compute support
if support >= minsup
add to list
The brute force method
Explores the entire itemset search space, regardless of minsup
Goal of Association Rule Mining
Given a set of transactions T, find all the rules having:
support >= minsup
confidence >= mincond
Apriori principle
If an itemset is frequent, then all of its subsets must be frequent as well
Apriori principle 2
If an itemset if infrequent, then all of its supersets must be infrequent as well
A rule is frequent if
the itemset XY is frequent, sup(XY) >= minsup
A rule is strong if
conf >= minconf
Rules are pruned using
confidence
confidence (x->y)
sup(XY)/sup(x)
Unlike support, confidence does not exhibit
the monotone property
If a rule x -> y\x does not satisfy the confidence threshold, then
any rule x’->y\x’, where x’ within X, must not satisfy the confidence threshold as well
What happens if misnup is too high
we may miss interesting low-support items ex: such items may correspond to expensive products that are rarely purchased by customers, but whose patterns are interesting to mine for the retailer
What happens if minsup is too low
We get information overload: too many frequent itemsets and too many spurious rules
How can some high confidence rules be misleading?
High confidence might not imply a meaningful relationship if the consequent is already common in the dataset, irrespective of the antecedent
Confidence measure ignores
the support of the itemset appearing in the rule consequent
which metric accounts for the consequent
lift
lift
conf(x->y)/rsup(y)
value of lift close to 1 implies
that the support of the rule is expected
Good lifts, bad lifts
> 1, «1