Lecture 10 - Association Rules Flashcards

1
Q

Association Rules pt1

A
  • Identify item clusters in event-based or transaction-based databases
  • Study of “what goes with what
    • Symptoms related to diagnosis
    • Customers who bought X also bought Y

Association Rules also called: Market basket analysis or affinity analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Example Association Rules

A

Market basket databases

  • Consist of a large number of transaction records
  • Each record lists all items bought by a customer on a single-purchased transaction
  • Detect certain groups of items are consistently purchased together

Information can be used to

  • Make decisions on store layouts
  • Design the upcoming catalog
  • Identify customer segments based on buying patterns

Amazon uses information for recommendations!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Rules

A
  • Represented in an IF-THEN format
    • “IF” part: antecedent, “THEN” part: consequent
  • Both correspond to sets of items (called itemsets)
  • Itemsets are
    • Possible combinations of items (e.g., products)
    • Can also be a single item
    • NOT records of what people buy
  • Antecedent and consequent are disjoint
    • I.e., have no items in common
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Example transaction

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Finding Association Rules

A
  • One items has many association rules
  • Every transaction is one itemset

→ Supports several rules

Two-stage Process:

  1. Generation of frequent itemises - i.e., Apriory algorithm
  2. Selecting the strong rules - i.e., criteria for judging the strength of the rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Generation of rules

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Frequent Itemsets

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Quiz 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Apriori algorithm

A

Goal: generate the frequent itemsets

for k items:

  • User sets a minimum support criterion
  • Generate list of one-item sets
  • Drop the ones bellow the support criterion
  • Use the list of one-itemsets to generate the two-itemsets
  • Drop the ones bellow the support criterion
  • Use the list of two-itemsets to generate the three-itemsets
  • Drop the ones bellow the support criterion
  • …(continue until k-itemsets)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assessment of rule strength

A

We need to measure the strength of the association implied by a rule

Measures:

  • Support
  • Confidence
  • Lift ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Confidence

A

Compares the co-occurence of items in antecedent and consequent to the occurrence of items in antecedent. Shows the percentage in which C appears with A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Relationship of Support with Confidence

A

Support: (Estimated) probably that a transaction randomly from the database will contain all items in the antecedent and the consequent. P(hat) (antecedent AND consequent)

Confidence: (Estimated) conditional probability that a transaction selected randomly will include all the items in the consequent given that the transaction includes all the items in the antecedent. P(hat) (consequent | antecedent)

  • High value of confidence suggests a strong association rule, i.e., rule in which we are highly confident
  • Can be deceptive when antecedent and consequent are independent, e.g.,:
    • Nearly all customers buy bananas and nearly all customers buy ice cream
    • High confidence level of “IF bananas THEN ice-cream”
    • Regardless of whether there is an association between the items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lift Ratio

A
  • Better way to judge the strength of a rule
  • Compares the confidence of the rule with a benchmark value
  • Confidence: percentage of antecedent transactions that also have the consequent item set
  • Lift: ratio of confidence with benchmark confidence
  • Benchmark confidence: transactions with consequent as percentage of all transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Lift intuition

A
  • Lift is a value between 0 and infinity
  • Value>1 indicates that antecedent and consequent are dependent on each other, and the degree of which is given y its value
  • Value<1 indicates that the presence of antecedent will have negative effect on consequent
  • Value1 indicates that antecedent and consequent are independent and no rule can be derived from them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Alternative data representation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example

A
17
Q

Lecture summary

A
  • Association rules produce rules on associations between items from a data sets with transactions
  • Widely used in recommender systems
  • Most popular method is Apriori algorithm
  • To reduce computation, we consider only “frequent” itemises (i.e., support)
  • Performance is measured by confidence and lift
  • Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy
17
Q

Lecture summary

A
  • Association rules produce rules on associations between items from a data sets with transactions
  • Widely used in recommender systems
  • Most popular method is Apriori algorithm
  • To reduce computation, we consider only “frequent” itemises (i.e., support)
  • Performance is measured by confidence and lift
  • Can produce a profusion of rules; review is required to identify useful rules and to reduce redundancy