2. Association Rule Mining Flashcards

1
Q

What is association rule mining?

A

It’s when, given a set of transactions, you find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

Example of a rule:
{Milk, Bread} –> {Beer}
–> means “implies”

This rule is taken from a list of 5 transactions that contain milk, bread, beer, and other items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an itemset?

A

A collection of one or more items

Ex: {Milk, Bread, Diaper}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the support count of an itemset?

A

The count of how many times that itemset appears in a set of transactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the support of an itemset?

A

The fraction of transactions that contain an itemset.

IOW:
# transactions containing itemset / total # transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a frequent itemset?

A

An itemset whose support is greater than or equal to some specified minsup (minimum support) threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an association rule?

A

An implication expression of the form X –> Y, where X and Y are itemsets.

Example: {Milk, Diaper} –> {Beer}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the confidence of an association rule?

A

For the rule X–>Y:
It measures how often items in Y appear in transactions that contain X

Ex: {Milk, Diaper} –> {Beer}
c = #(Milk, Diaper, Beer} / #(Milk, Diaper)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the support of an association rule?

A

For the rule X–>Y:
It is the fraction of transactions that contain both X and Y

Ex: {Milk, Diaper} –> {Beer}
s = #(Milk, Diaper, Beer) / #Transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two main tasks of association rule mining?

A

Given a set of transactions T, the goal of association rule mining is to find all rules having

  1. support >= minSup threshold
  2. confidence >= minConf threshold
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the brute-force approach to association rule mining

A
  1. List out all possible association rules (every combination of items)
  2. Compute the support and confidence for each rule
  3. Prune (remove) rules that fail to meet the minSup and minComp thresholds
    * This takes WAY too long to compute for large datasets*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is interesting about the support of rules that derive from the same itemset?
(Ex: A–>BC, B→AC, …)

A

Rules originating from the same itemset have identical support (though they can have different confidence)

This allows us to reduce the number of times we have to calculate support.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the two-step approach to mining association rules?

A
  1. Generate all itemsets whose support is >= minSup (frequent itemsets)
  2. Generate high confidence rules from each frequent itemset, where each rule is a binary partition of a frequent itemset

*Note: frequent itemset generation is still very expensive computationally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If a set AB is infrequent, what other information do we know (a priori)?

A

We know that any superset (set that contains AB) is also infrequent, because if AB is infrequent, a set that is more restrictive (ex: ABCDEFG) must also be infrequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the strategy of reducing the number of candidates (M) to generate frequent itemsets.

A

First we do a complete search of the 2^d candidate itemsets

Then we use pruning techniques to reduce the number of candidates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the strategy of reducing the number of comparisons in generating frequent itemsets.

A

Use efficient data structures to store the candidates or transactions to minimize the number of comparisons.

There is no need to match every candidate against every transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the a priori principle and how does it relate to frequent itemset generation?

A

It states that if an itemset is frequent, then all of its subsets must also be frequent.

IOW: If ABC is frequent, then A, B, C, AB, AC and BC are also frequent, which makes sense because they appear every time that ABC appears.

17
Q

What is the anti-monotone property?

A

The property nested within the a priori principle where

  1. any subset of a frequent itemset must also be frequent AND
  2. any superset of an infrequent itemset must also be infrequent
18
Q

Understand the illustration of the a priori principle.

A

See Slide 13 of Association1.ppt

19
Q

Visualize an example of the a priori algorithm on a database of transactions.

A

See Slide 14 of Association1.ppt

20
Q

Describe the pseudocode for the a prior algorithm.

A

First, let k = 1
Generate frequent itemsets of length k
When no frequent itemsets remain:
Generate length k+1 candidate itemsets FROM the set of frequent itemsets of length k
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those that are frequent