Intro and Association Rules Flashcards

1
Q

Major data mining tasks?

A

Classification (supervised machine learning)
Clustering (unsupervised machine learning)
Discovery of association rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of data do association rules normally use?

A

Basket data (binary such as yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are association rules?

A

IF-THEN statements, if x then y (x and y are sets of items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we measure the quality of association rules?

A

Support (Sup) = no. of transactions having X and Y divided by the Total no. of transactions

Confidence (Conf) = No. of transactions having X and Y divided by the No. of transactions having X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the goal of association rules?

A

To find ALL rules with Sup G/E to Min_Sup, Conf G/E to Min_Conf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Example of Association Rules

Rules discovered from previous input, assuming Min_Sup = 0.3 and Min_Conf = 0.8

A

IF (coffee) THEN (bread) Sup = 0.3, Conf = 1
IF (bread) THEN (bread) Sup = 0.4, Conf = 0.8

Reminder:
Support (Sup) = no. of transactions having X and Y divided by the Total no. of transactions

Confidence (Conf) = No. of transactions having X and Y divided by the No. of transactions having X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two phases in Association Rule Algorithms?

A

Phase 1: Discover all frequent item sets
- i.e, all sets with Sup G/E Min_Sup specified by the user

Phase 2: Discover all rules with high confidence
- i.e use the frequent item sets to discover association rules with Conf G/E Min_Conf specified by the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the steps in Phase 1 of association rules

A

1st step - compute support of each set with just 1 item
2nd step - compute support of each set with 2 items
3rd step - compute support of each set with 3 items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In step 2 and 3 of phase 1, what are the optimisations that can be made?

A

Step 2 optimisation: If 1 item set is not common, then 2 item sets cannot be common, thus all item sets with the uncommon 1 item set can be ignored

Step 3 optimisation: If a 2 item set is not frequent, a 3 item set containing these 2 item sets cannot be frequent. Thus all item sets containing the 2 items can be ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Confidence in association rules?

A

Generating all possible rules from each frequent set and computing the confidence of each rule, e.g.
- Item set: {coffee, bread}
Candidate Rules:
IF coffee THEN bread: Conf = 1.0
IF bread THEN coffee: Conf = 0.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the usefulness of association rules?

A

Putting items that, for example, together which are frequently bought together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do association rules NOT imply?

A

Causation and predictive power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some extensions to the standard association rule algorithm?

A

Hierarchy of items
Selection rules “interesting” to the user
Multiple minimum supports

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the motivation of using a hierarchy of items?

A

Rules in lower levels may not have minimum support
A hierarchy can be used to prune non-interesting rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why may “Selecting rules interesting to the user” be used?

A

User may be interested in only associations interesting to him/her

Preference of the user could refer to items in several levels of a hierarchy

Can ask the user to specify “templates” of interesting and non-interesting associations

Rules can be filtered in a post-processing step or incorporating restrictions into the mining algorithm (which is computationally faster)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why may “Multiple minimum supports” be used?

A

The use of a single minimum support for all items assumes that all items are “equally important”, which is and unrealistic assumptions in many cases.

Specifying a minimum support for each item allows for a more flexible approach. Allowing for the discovery of rules with “rare items” without increasing a lot of the number of discovered rules.

17
Q

How to compute the “minimum support” of a rule in “Multiple minimum supports”?

A

The smallest value of a minimum support among all items that occur in the rule.