Association Analysis Flashcards
a collection of one or more items
Itemset
fraction of transaction that contain an itemset.
Support
an itemset whose support is greater than or equal to minsup (minimum support) threshold.
Frequent Itemset
an implication of the form X -> Y where X and Y are itemset.
Association Rule
2 Rule Evaluation Metrics for Association Analysis
- Support
- Confidence
fraction of transaction that contain both X and Y
Support
{Milk, Bread} -> {Diaper}
Formula:
{Milk, Bread, Diaper} / # of List
In the example:
2 / 5 has {Milk, Bread, Diaper}
Support = 0.4
measures how often items in Y appears in transaction that contain X.
Confidence
{Milk, Bread} -> {Diaper}
Formula:
{Milk, Bread, Diaper} / {Milk, Bread} (# of occurrence)
In the example:
2 / 3 (3 instances where {Milk, Bread} is in the list)
Confidence = 0.67
Given a set of transactions, the goal of association rule mining is to find all rules having:
o Support >= minsup
threshold
o Confidence >= minconf threshold
Association Rule Mining Task
an approach where we list all possible association rules, compute the support and confidence for each rule, prune rules that fail minsup and minconf threshold.
Brute Force Approach
The 2 Step Approach in Mining Association Rules
- Frequent Itemset Generation
- Rule Generation
generate all itemset whose support >= minsup
Frequent Itemset Generation
3 Generation Strategies in Frequent Itemset Generation
- Reduce number of candidates
- Reduce number of transactions
- Reduce number of comparisons
a principle that states if an itemset is frequent, then all of its subset must also be frequent.
Apriori Principle
a process of scanning database of transaction to determining the support of each candidate itemset.
Candidate Counting
This is where we store candidate to reduce number of candidates.
Hash Structure.