Association Pattern Mining Flashcards

Question 1

Q

What is the purpose of association pattern mining

Answer

A

it can be used to detect interesting associations between items

Question 2

Q

provide a concrete example of this pattern mining

Answer

A

90 percent of transactions that bought bread and butter also purchased millk

Question 3

Q

what is the pipeline for association rule mining

Answer

A

Data Preparation
Frequent itemset Mining
Interestingness MEasurement

Question 4

Q

how are items presented association pattern mining

Answer

A

generally, they are presented using binary representation

Question 5

Q

what is the support of an itemset

Answer

A

the number of the whole items in the dataset divided but the number of items containing this item set as a subset

Question 6

Q

Explain briefly the apriori algorithm

Answer

A

The whole idea consists of, every subset of a frequent itemset is also frequent or every subset of infrequent itemset is also infrequent

Question 7

Q

for a given two sets of items X and Y when can we say that the rule X=>Y is an association rule

Answer

A

Sup(X ∪ Y) >= minsup, and
Conf(X ⇒ Y) >= minconf
where Conf(X ⇒ Y) = Sup(XUY)/Conf(Y)

Question 8

Q

How to mine for association rules given minsup, minconf ?

Answer

A

F ← all frequent itemsets with sup >= minsup.

2. Using F, generate rules with conf >= minconf.

Question 9

Q

what is the downsides of using brute force way and how to mitigate it

Answer

A

using the lattice directly means we have a complexity of 2^U which is exponential and means we loop through the whole lattice a better alternative consist of using the apriori algorithm

Question 10

Q

Explain the apriori algorithm

Answer

A

Based on a minimum support count we can define the frequent items.
then we initialize the database of each layer we perform a pruning where we omit non-frequent items

Question 11

Q

what is the downside of the apriori algorithm and how can we improve

Answer

A

in the generate and test strategy there, the test is counting hence to improve that, instead of iterating over the dataset and go count, we iterate of the database and update the count of the item set

Question 12

Q

suggest some more apriori tricks

Answer

A

we have Partitioning where we divide the database into a set of disjoint partitions this allows to parallelize the process
perform a sampling; where we only sample 10 per cent of the data although accuracy is not that much-guaranteed but efficiency is improved

Question 13

Q

How to find frequent itemset without candidate generation

Answer

A

FP growth performs a transformation to the database into a tree structure where the mining happens on these trees, then we divide and mine each pattern separately to generate association rumes

Question 14

Q

explain the steps of FP growth

Answer

A

initially count the items then on each column we sort by counting
then creat tree branch from the transactions following this order (do not forget to solve on your own an fpgrowth tree)

Question 15

Q

does FP-growth traverse the lattice

Answer

A

the appriori traverse the lattice in depth

Question 16

Q

what is the lift

Answer

A

the ratio of independence between two sets if lift = 1 means they are independent if lift >1 means X and Y have some dependency if lift<1 means the two sets are contradicting