Mining Association Rules COPY Flashcards
What’s the motivation for studying Mining Association Rules?
To look for interesting relationships between objects in large datasets.
What are we trying to do when studying Mining Association Rules?
Find all rules that correlate the presence of one set of items with another set of items E.g., 80% of customers who buy {diapers} tend to buy {beer, milk}.
Provide Formal Notations of the following
item
itemset
k-itemset
transaction
transaction dataset
- An item: an item in a basket
- An itemset is a set of items. n E.g., X = {milk, bread, cereal} is an itemset.
- A k-itemset is an itemset with k items.
- A transaction: items purchased in a basket n it may have TID (transaction ID)
- A transactional dataset: A set of transactions
What do we mean when we say X-> Y in Mining Association Rules?
If they buy X, they will buy Y.
What is support and confidence in Association Rule Mining?
Support is a measure of how frequent an item appears in the set.
E.g Half of the people at Woolworths have milk in their basket. The support is 0.5 or 50%.
Confidence is a measure of how likely an item is bought if another item is also bought (X->Y). Of the people who buy milk, 80% of people buy bread as well. Confidence is 0.8
What do we call association rules that satisfy both the Min_Support and Min_Confidence?
These are Strong Association Rules.
What is the minimum support mean?
The minimum frequency we care about.
If minimum support equals 3.
Any item that occurs only 2 times is not important for our analysis.
What is the conditional Probability formula for confdence?
Confidence (X -> Y) = P(Y | X) = P(X U Y) / P(X)
What is the goal of association rule mining? What do we minimally want for a rule?
The goal of association rule mining is to find all rules having
- support ≥ min_sup threshold
- confidence ≥ min_conf threshold
What algorithms do we you use for Mining Association Rules?
- Apriori Algorithm
- Frequent Pattern (FP) Growth Algorithm
What are the two steps in Mining Association Rules?
- Frequent Itemset Generation
– Get all itemsets whose support ≥ minsup
- Generate high confidence rules from each frequent itemset
What is the principle of the Apriori Algorithm?
If an itemset is frequent, then all of its subsets must also be frequent.
Explain how to
perform the Apriori Algorithm on this Itemset
What are some factors that affect the complexity of the Apriori Algorithm?
- The choice of minimum support threshold
- Dimensionality (number of items) in the data set
- Size of database
- Average transaction width