2. Association Rule Mining Flashcards

Question 1

Q

What is association rule mining?

Answer

A

It’s when, given a set of transactions, you find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

Example of a rule:
{Milk, Bread} –> {Beer}
–> means “implies”

This rule is taken from a list of 5 transactions that contain milk, bread, beer, and other items.

Question 2

Q

What is an itemset?

Answer

A

A collection of one or more items

Ex: {Milk, Bread, Diaper}

Question 3

Q

What is the support count of an itemset?

Answer

A

The count of how many times that itemset appears in a set of transactions.

Question 4

Q

What is the support of an itemset?

Answer

A

The fraction of transactions that contain an itemset.

IOW:
# transactions containing itemset / total # transactions

Question 5

Q

What is a frequent itemset?

Answer

A

An itemset whose support is greater than or equal to some specified minsup (minimum support) threshold

Question 6

Q

What is an association rule?

Answer

A

An implication expression of the form X –> Y, where X and Y are itemsets.

Example: {Milk, Diaper} –> {Beer}

Question 7

Q

What is the confidence of an association rule?

Answer

A

For the rule X–>Y:
It measures how often items in Y appear in transactions that contain X

Ex: {Milk, Diaper} –> {Beer}
c = #(Milk, Diaper, Beer} / #(Milk, Diaper)

Question 8

Q

What is the support of an association rule?

Answer

A

For the rule X–>Y:
It is the fraction of transactions that contain both X and Y

Ex: {Milk, Diaper} –> {Beer}
s = #(Milk, Diaper, Beer) / #Transactions

Question 9

Q

What are the two main tasks of association rule mining?

Answer

A

Given a set of transactions T, the goal of association rule mining is to find all rules having

support >= minSup threshold
confidence >= minConf threshold

Question 10

Q

Describe the brute-force approach to association rule mining

Answer

A

List out all possible association rules (every combination of items)
Compute the support and confidence for each rule
Prune (remove) rules that fail to meet the minSup and minComp thresholds
* This takes WAY too long to compute for large datasets*

Question 11

Q

What is interesting about the support of rules that derive from the same itemset?
(Ex: A–>BC, B→AC, …)

Answer

A

Rules originating from the same itemset have identical support (though they can have different confidence)

This allows us to reduce the number of times we have to calculate support.

Question 12

Q

What is the two-step approach to mining association rules?

Answer

A

Generate all itemsets whose support is >= minSup (frequent itemsets)
Generate high confidence rules from each frequent itemset, where each rule is a binary partition of a frequent itemset

*Note: frequent itemset generation is still very expensive computationally

Question 13

Q

If a set AB is infrequent, what other information do we know (a priori)?

Answer

A

We know that any superset (set that contains AB) is also infrequent, because if AB is infrequent, a set that is more restrictive (ex: ABCDEFG) must also be infrequent.

Question 14

Q

Describe the strategy of reducing the number of candidates (M) to generate frequent itemsets.

Answer

A

First we do a complete search of the 2^d candidate itemsets

Then we use pruning techniques to reduce the number of candidates.

Question 15

Q

Describe the strategy of reducing the number of comparisons in generating frequent itemsets.

Answer

A

Use efficient data structures to store the candidates or transactions to minimize the number of comparisons.

There is no need to match every candidate against every transaction.

Question 16

Q

What is the a priori principle and how does it relate to frequent itemset generation?

Answer

Study These Flashcards

A

It states that if an itemset is frequent, then all of its subsets must also be frequent.

IOW: If ABC is frequent, then A, B, C, AB, AC and BC are also frequent, which makes sense because they appear every time that ABC appears.

Question 17

Q

What is the anti-monotone property?

Answer

Study These Flashcards

A

The property nested within the a priori principle where

any subset of a frequent itemset must also be frequent AND
any superset of an infrequent itemset must also be infrequent

Question 18

Q

Understand the illustration of the a priori principle.

Answer

Study These Flashcards

A

See Slide 13 of Association1.ppt

Question 19

Q

Visualize an example of the a priori algorithm on a database of transactions.

Answer

Study These Flashcards

A

See Slide 14 of Association1.ppt

Question 20

Q

Describe the pseudocode for the a prior algorithm.

Answer

Study These Flashcards

A

First, let k = 1
Generate frequent itemsets of length k
When no frequent itemsets remain:
Generate length k+1 candidate itemsets FROM the set of frequent itemsets of length k
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only those that are frequent

2. Association Rule Mining Flashcards

(20 cards)