Intro and Association Rules Flashcards
Major data mining tasks?
Classification (supervised machine learning)
Clustering (unsupervised machine learning)
Discovery of association rules
What type of data do association rules normally use?
Basket data (binary such as yes/no)
What are association rules?
IF-THEN statements, if x then y (x and y are sets of items)
How do we measure the quality of association rules?
Support (Sup) = no. of transactions having X and Y divided by the Total no. of transactions
Confidence (Conf) = No. of transactions having X and Y divided by the No. of transactions having X
What is the goal of association rules?
To find ALL rules with Sup G/E to Min_Sup, Conf G/E to Min_Conf
Example of Association Rules
Rules discovered from previous input, assuming Min_Sup = 0.3 and Min_Conf = 0.8
IF (coffee) THEN (bread) Sup = 0.3, Conf = 1
IF (bread) THEN (bread) Sup = 0.4, Conf = 0.8
Reminder:
Support (Sup) = no. of transactions having X and Y divided by the Total no. of transactions
Confidence (Conf) = No. of transactions having X and Y divided by the No. of transactions having X
What are the two phases in Association Rule Algorithms?
Phase 1: Discover all frequent item sets
- i.e, all sets with Sup G/E Min_Sup specified by the user
Phase 2: Discover all rules with high confidence
- i.e use the frequent item sets to discover association rules with Conf G/E Min_Conf specified by the user
Describe the steps in Phase 1 of association rules
1st step - compute support of each set with just 1 item
2nd step - compute support of each set with 2 items
3rd step - compute support of each set with 3 items
In step 2 and 3 of phase 1, what are the optimisations that can be made?
Step 2 optimisation: If 1 item set is not common, then 2 item sets cannot be common, thus all item sets with the uncommon 1 item set can be ignored
Step 3 optimisation: If a 2 item set is not frequent, a 3 item set containing these 2 item sets cannot be frequent. Thus all item sets containing the 2 items can be ignored.
What is Confidence in association rules?
Generating all possible rules from each frequent set and computing the confidence of each rule, e.g.
- Item set: {coffee, bread}
Candidate Rules:
IF coffee THEN bread: Conf = 1.0
IF bread THEN coffee: Conf = 0.6
What is the usefulness of association rules?
Putting items that, for example, together which are frequently bought together.
What do association rules NOT imply?
Causation and predictive power
What are some extensions to the standard association rule algorithm?
Hierarchy of items
Selection rules “interesting” to the user
Multiple minimum supports
What is the motivation of using a hierarchy of items?
Rules in lower levels may not have minimum support
A hierarchy can be used to prune non-interesting rules
Why may “Selecting rules interesting to the user” be used?
User may be interested in only associations interesting to him/her
Preference of the user could refer to items in several levels of a hierarchy
Can ask the user to specify “templates” of interesting and non-interesting associations
Rules can be filtered in a post-processing step or incorporating restrictions into the mining algorithm (which is computationally faster)
Why may “Multiple minimum supports” be used?
The use of a single minimum support for all items assumes that all items are “equally important”, which is and unrealistic assumptions in many cases.
Specifying a minimum support for each item allows for a more flexible approach. Allowing for the discovery of rules with “rare items” without increasing a lot of the number of discovered rules.
How to compute the “minimum support” of a rule in “Multiple minimum supports”?
The smallest value of a minimum support among all items that occur in the rule.