Lecture 10 - Association Rules Flashcards
Association Rules pt1
- Identify item clusters in event-based or transaction-based databases
- Study of “what goes with what”
- Symptoms related to diagnosis
- Customers who bought X also bought Y
Association Rules also called: Market basket analysis or affinity analysis
Example Association Rules
Market basket databases
- Consist of a large number of transaction records
- Each record lists all items bought by a customer on a single-purchased transaction
- Detect certain groups of items are consistently purchased together
Information can be used to
- Make decisions on store layouts
- Design the upcoming catalog
- Identify customer segments based on buying patterns
Amazon uses information for recommendations!!
Rules
- Represented in an IF-THEN format
- “IF” part: antecedent, “THEN” part: consequent
- Both correspond to sets of items (called itemsets)
-
Itemsets are
- Possible combinations of items (e.g., products)
- Can also be a single item
- NOT records of what people buy
- Antecedent and consequent are disjoint
- I.e., have no items in common
Example transaction
Finding Association Rules
- One items has many association rules
- Every transaction is one itemset
→ Supports several rules
Two-stage Process:
- Generation of frequent itemises - i.e., Apriory algorithm
- Selecting the strong rules - i.e., criteria for judging the strength of the rules
Generation of rules
Frequent Itemsets
Quiz 1
Apriori algorithm
Goal: generate the frequent itemsets
for k items:
- User sets a minimum support criterion
- Generate list of one-item sets
- Drop the ones bellow the support criterion
- Use the list of one-itemsets to generate the two-itemsets
- Drop the ones bellow the support criterion
- Use the list of two-itemsets to generate the three-itemsets
- Drop the ones bellow the support criterion
- …(continue until k-itemsets)
Assessment of rule strength
We need to measure the strength of the association implied by a rule
Measures:
- Support
- Confidence
- Lift ratio
Confidence
Compares the co-occurence of items in antecedent and consequent to the occurrence of items in antecedent. Shows the percentage in which C appears with A.
Relationship of Support with Confidence
Support: (Estimated) probably that a transaction randomly from the database will contain all items in the antecedent and the consequent. P(hat) (antecedent AND consequent)
Confidence: (Estimated) conditional probability that a transaction selected randomly will include all the items in the consequent given that the transaction includes all the items in the antecedent. P(hat) (consequent | antecedent)
- High value of confidence suggests a strong association rule, i.e., rule in which we are highly confident
- Can be deceptive when antecedent and consequent are independent, e.g.,:
- Nearly all customers buy bananas and nearly all customers buy ice cream
- High confidence level of “IF bananas THEN ice-cream”
- Regardless of whether there is an association between the items
Lift Ratio
- Better way to judge the strength of a rule
- Compares the confidence of the rule with a benchmark value
- Confidence: percentage of antecedent transactions that also have the consequent item set
- Lift: ratio of confidence with benchmark confidence
- Benchmark confidence: transactions with consequent as percentage of all transactions
Lift intuition
- Lift is a value between 0 and infinity
- Value>1 indicates that antecedent and consequent are dependent on each other, and the degree of which is given y its value
- Value<1 indicates that the presence of antecedent will have negative effect on consequent
- Value≈1 indicates that antecedent and consequent are independent and no rule can be derived from them
Alternative data representation