Part 2: BI association rules Flashcards

Question 1

Q

Objective of association

Answer

A

Objective is finding interesting associations (relationships) between attributes in a data set. -> no classification because we do not have a class variable, that’s why it is unsupervised learning.

Question 2

Q

Rules of association

Answer

A

Antecedent -> consequent

LHS -> RHS

Question 3

Q

antecedent / #LHS

Answer

A

Number of items (records) in the database that match with antecedent.

Question 4

Q

Indices for rules

Answer

A

Support (Coverage) =#(LHS and RHS)/#DB or
#(antecedent and consequent) / #DB
Accuracy (Confidence) = #(LHS and RHS)/#LHS or
#(antecedent and consequent) / #antecedent

Question 5

Q

Frequent itemsets

Answer

A

Item: one attribute - value pair
Itemset: all items occurring in a transaction or record
Frequent itemset - an itemset with minimal support k, i.e., support exceeding a threshold k predefined by the user.

Question 6

Q

Itemsets association rules

Answer

A

Association rule: IF - THEN format.
+ LHS, RHS - one item (attribute-value pair) or conjunction of items.
One itemset -> many association rules.

Question 7

Q

Apriori property

Answer

A

e.g. itemset (A, B, C)
Support(A, B, C) >_ k -> support(A, B) >_ k, support(B, C) >_k, etc. for all subsets.
Note: opposite may not be true.

Question 8

Q

N-itemsets with minimal support

Answer

A

Find all 1-itemsets with minimum support.
Store them in file1.
Compute all 2-itemsets by combining 1-itemsets.
Store 2-itemsets with minimum support in file2.
Compute all 3-itemsets by combining 2-itemsets.
Store 3-itemsets with minimum support in file3.
etc.

Question 9

Q

Finding association rules

Answer

A

A typical question: “find all association rules with support >_ s and confidence >_ c.”
Note: “support” of an association rule is the support of the set of items it mentions.
Hard part: finding the high-support (frequent) itemsets.
+ checking the confidence of association rules involving those sets is relatively easy.

Question 10

Q

Apriori algorithm

Answer

A

Definition = algorithm for finding association rules.
Description =
Step 1: find all frequent itemsets with minimal support k.
Step 2: from all frequent itemsets found in step 1, find the association rules with minimal accuracy m.

Question 11

Q

Rule interestingness measures

Answer

A

- Objective measures: 
\+ support
\+ confidence
\+ lift
- Subjective measures: 
A rule (pattern) is interesting if
\+ it is unexpected (surprising to the user)
\+ actionable (the user can do something with it)

Question 12

Q

Benchmark confidence

Answer

A

Confidence = #(antecedent and consequent) / #antecedent.
Assume antecedent and consequent are independent.
Then: confidence = PriorProb(consequent)
Note: Prob of an event can be estimated by fraction of records in the database this fact occurs.

Question 13

Q

Lift measure

Answer

A

Tells us how strong the relation is between the antecedent and the consequent.
Rule: LHS -> RHS or antecedent -> consequent
Lift: Confidence/Prob(RHS) = Prob(LHS and RHS)/(Prob(LHS)*Prob(RHS))
Prob(RHS) = benchmark confidence

We assume that fractions in database are good approximations for probability.

Lift = 0 -> means that fr(RHS and LHS) = 0
Lift = 1 -> means that RHS and LHS are independent.
Lift&raquo_space; 1 -> most interesting rule, LHS strong indicator for RHS. Sometimes also &laquo_space;1 is interesting.

Question 14

Q

Summary

Answer

A

Association belongs to unsupervised learning.
Association rules vs. classification (decision) rules:
+ classification rules predict only one attribute, called class, whereas association rules find associations between attributes without distinction.
+ RHS of an association rule may contain conjunction of attribute-value pairs, whereas RHS of a classification rule contains only the class value.
+ association rules are not intended to be used together as a set, whereas classification rules are.

Part 2: BI association rules Flashcards

(14 cards)