Association Pattern Mining Flashcards
What is the purpose of association pattern mining
it can be used to detect interesting associations between items
provide a concrete example of this pattern mining
90 percent of transactions that bought bread and butter also purchased millk
what is the pipeline for association rule mining
Data Preparation
Frequent itemset Mining
Interestingness MEasurement
how are items presented association pattern mining
generally, they are presented using binary representation
what is the support of an itemset
the number of the whole items in the dataset divided but the number of items containing this item set as a subset
Explain briefly the apriori algorithm
The whole idea consists of, every subset of a frequent itemset is also frequent or every subset of infrequent itemset is also infrequent
for a given two sets of items X and Y when can we say that the rule X=>Y is an association rule
- Sup(X ∪ Y) >= minsup, and
- Conf(X ⇒ Y) >= minconf
where Conf(X ⇒ Y) = Sup(XUY)/Conf(Y)
How to mine for association rules given minsup, minconf ?
- F ← all frequent itemsets with sup >= minsup.
2. Using F, generate rules with conf >= minconf.
what is the downsides of using brute force way and how to mitigate it
using the lattice directly means we have a complexity of 2^U which is exponential and means we loop through the whole lattice a better alternative consist of using the apriori algorithm
Explain the apriori algorithm
Based on a minimum support count we can define the frequent items.
then we initialize the database of each layer we perform a pruning where we omit non-frequent items
what is the downside of the apriori algorithm and how can we improve
in the generate and test strategy there, the test is counting hence to improve that, instead of iterating over the dataset and go count, we iterate of the database and update the count of the item set
suggest some more apriori tricks
we have Partitioning where we divide the database into a set of disjoint partitions this allows to parallelize the process
perform a sampling; where we only sample 10 per cent of the data although accuracy is not that much-guaranteed but efficiency is improved
How to find frequent itemset without candidate generation
FP growth performs a transformation to the database into a tree structure where the mining happens on these trees, then we divide and mine each pattern separately to generate association rumes
explain the steps of FP growth
initially count the items then on each column we sort by counting
then creat tree branch from the transactions following this order (do not forget to solve on your own an fpgrowth tree)
does FP-growth traverse the lattice
the appriori traverse the lattice in depth