Association Pattern Mining Flashcards

1
Q

What is the purpose of association pattern mining

A

it can be used to detect interesting associations between items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

provide a concrete example of this pattern mining

A

90 percent of transactions that bought bread and butter also purchased millk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the pipeline for association rule mining

A

Data Preparation
Frequent itemset Mining
Interestingness MEasurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how are items presented association pattern mining

A

generally, they are presented using binary representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the support of an itemset

A

the number of the whole items in the dataset divided but the number of items containing this item set as a subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain briefly the apriori algorithm

A

The whole idea consists of, every subset of a frequent itemset is also frequent or every subset of infrequent itemset is also infrequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

for a given two sets of items X and Y when can we say that the rule X=>Y is an association rule

A
  1. Sup(X ∪ Y) >= minsup, and
  2. Conf(X ⇒ Y) >= minconf
    where Conf(X ⇒ Y) = Sup(XUY)/Conf(Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to mine for association rules given minsup, minconf ?

A
  1. F ← all frequent itemsets with sup >= minsup.

2. Using F, generate rules with conf >= minconf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the downsides of using brute force way and how to mitigate it

A

using the lattice directly means we have a complexity of 2^U which is exponential and means we loop through the whole lattice a better alternative consist of using the apriori algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the apriori algorithm

A

Based on a minimum support count we can define the frequent items.
then we initialize the database of each layer we perform a pruning where we omit non-frequent items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the downside of the apriori algorithm and how can we improve

A

in the generate and test strategy there, the test is counting hence to improve that, instead of iterating over the dataset and go count, we iterate of the database and update the count of the item set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

suggest some more apriori tricks

A

we have Partitioning where we divide the database into a set of disjoint partitions this allows to parallelize the process
perform a sampling; where we only sample 10 per cent of the data although accuracy is not that much-guaranteed but efficiency is improved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find frequent itemset without candidate generation

A

FP growth performs a transformation to the database into a tree structure where the mining happens on these trees, then we divide and mine each pattern separately to generate association rumes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

explain the steps of FP growth

A

initially count the items then on each column we sort by counting
then creat tree branch from the transactions following this order (do not forget to solve on your own an fpgrowth tree)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

does FP-growth traverse the lattice

A

the appriori traverse the lattice in depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the lift

A

the ratio of independence between two sets if lift = 1 means they are independent if lift >1 means X and Y have some dependency if lift<1 means the two sets are contradicting