Mining Association Rules COPY Flashcards

Question 1

Q

What’s the motivation for studying Mining Association Rules?

Answer

A

To look for interesting relationships between objects in large datasets.

Question 2

Q

What are we trying to do when studying Mining Association Rules?

Answer

A

Find all rules that correlate the presence of one set of items with another set of items E.g., 80% of customers who buy {diapers} tend to buy {beer, milk}.

Question 3

Q

Provide Formal Notations of the following

item

itemset

k-itemset

transaction

transaction dataset

Answer

A

An item: an item in a basket
An itemset is a set of items. n E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
A transaction: items purchased in a basket n it may have TID (transaction ID)
A transactional dataset: A set of transactions

Question 4

Q

What do we mean when we say X-> Y in Mining Association Rules?

Answer

A

If they buy X, they will buy Y.

Question 5

Q

What is support and confidence in Association Rule Mining?

Answer

A

Support is a measure of how frequent an item appears in the set.

E.g Half of the people at Woolworths have milk in their basket. The support is 0.5 or 50%.

Confidence is a measure of how likely an item is bought if another item is also bought (X->Y). Of the people who buy milk, 80% of people buy bread as well. Confidence is 0.8

Question 6

Q

What do we call association rules that satisfy both the Min_Support and Min_Confidence?

Answer

A

These are Strong Association Rules.

Question 7

Q

What is the minimum support mean?

Answer

A

The minimum frequency we care about.

If minimum support equals 3.

Any item that occurs only 2 times is not important for our analysis.

Question 8

Q

What is the conditional Probability formula for confdence?

Answer

A

Confidence (X -> Y) = P(Y | X) = P(X U Y) / P(X)

Question 9

Q

What is the goal of association rule mining? What do we minimally want for a rule?

Answer

A

The goal of association rule mining is to find all rules having

support ≥ min_sup threshold
confidence ≥ min_conf threshold

Question 10

Q

What algorithms do we you use for Mining Association Rules?

Answer

A

Apriori Algorithm
Frequent Pattern (FP) Growth Algorithm

Question 11

Q

What are the two steps in Mining Association Rules?

Answer

A

Frequent Itemset Generation

– Get all itemsets whose support ≥ minsup

- Generate high confidence rules from each frequent itemset

Question 12

Q

What is the principle of the Apriori Algorithm?

Answer

A

If an itemset is frequent, then all of its subsets must also be frequent.

Question 13

Q

Explain how to

perform the Apriori Algorithm on this Itemset

Question 14

Q

What are some factors that affect the complexity of the Apriori Algorithm?

Answer

A

The choice of minimum support threshold
Dimensionality (number of items) in the data set
Size of database
Average transaction width

Question 15

Q