B04 Association Rules Flashcards
Exam Prep
__________ are involved with summarizing or grouping data in new and interesting ways. In these types of models, no single feature is more important than the other. The process of training a __________ is known as ___________
Descriptive Models
Descriptive Model
Unsupervised Learning
The application of machine learning to customer retail transactional data in order to detect patterns in purchasing behavior.
Market Basket Analysis
The result of a market basket analysis is a collection of __________ that specify patterns found in the relationship among items or itemsets.”
Association Rules
{beer, milk}→{diaper} LHS -> RHS Define the LHS and RHS
LHS condition that needs to be met to trigger the rule RHS the expected result of meeting the condition
Association Rules imply causality between items True False
False
Kinds of Association Rules 1.______ 2.______ 3.______
Actionable Trivial Inexplicable
Define Itemset
A collection of one or more items e.g. {beer, bread, diaper, milk, eggs}.
Define Transaction
The itemset for an observation. e.g. T(3) = {milk, diaper, beer, coke}
Define Support Count (σ)
Frequency of an itemset. e.g. σ({beer, milk, diaper}) = 2
Define Support (s)
How frequently a rule occurs in the dataset.
Support is the fraction of transactions containing an itemset.
Define Confidence
The predictive power or accuracy of a rule.
Confidence is the support of the itemset containing both X and Y divided by the support of the itemset containing only X.
Define Lift
The increased likelihood that a rule occurs in a dataset relative to its typical rate of occurrence.
Lift is the confidence of the itemset containing both X and Y divided by the support of the itemset containing only Y.
Rules with ___ support may occur simply by chance and are typically not actionable
Low
The Apriori Principle is based on the __________ property of support.
anti-monotone
The support of an itemset _____ (always, sometimes, never) exceeds that of its subsets.
Never
How do you reduce the computational complexity of Association rules?
- Apriori Algorithm - pruning the itemset lattice
- FP-growth - Reduce number of comparisons by using advanced data structures to store the candidate itemsets or to compress the dataset
Approaches to Association Rules include:
- Brute Force Approach - generate every itemset (computationally expensive)
- Frequent Itemset Generation - generate every itemset whose support is greater than an established minimum (computationally expensive)
- Apriori approach
- FP Growth
How does the Apriori approach work?
Pruning - If we identify an itemset as being infrequent, then its supersets should not be generated/tested
Advantages of Association Rules?
- Capable of working with large amounts of transactional data. 2.Rules are easy to understand.
- Useful for “data mining” and discovering unexpected patterns in data.
Weaknesses of Association Rules?
- Not very useful for small data sets.
- Separating true insight from common sense requires some effort.
- Easy to draw misleading conclusions from random patterns.