05. Association Rules Flashcards

Question 1

Q

Association Rules (Market Basket Analysis) is what

Answer

A

Association Rules is an unsupervised, descriptive, method to discover interesting relationships. The disclosed relationships can be represented as rules or frequent itemsets. Commonly used for mining transaction databases.

Question 2

Q

The Association Rules, if “X” is observed then “Y” has a high probability of being observed approach, can be applied to which fields

Answer

A

Which products tend to be purchased together?
Of those customers who are similar to this person, what products do they tend to buy?
Of those customers who have purchased this product, what other similar products do they tend to view or purchase?

Question 3

Q

In the rule “when item’s X is observed, then item’s Y is also observed” what is X and what is Y

Answer

A

X is called antecedent or left-hand-side (LHS)

Y is called consequent or right-hand-side (RHS)

Question 4

Q

What is the notation and meaning of a k itemset

Answer

A

In a k itemset the k refers to the total number of items in that itemset {item 1, item 2, …, item k}

Question 5

Q

What is the underpinning idea of the Apriori algorithm

Answer

A

It is a method of “pruning” the otherwise exponential associations by considering the “downward closure property” which is to say that if an itemset is considered frequent, then any subset of the frequent itemset must also be frequent.

Question 6

Q

What is a frequent itemset

Answer

A

A frequent itemset has items that appear together often enough. The term “often enough” is formally defined with a minimum support criterion. If the minimum support is set at 0.5, any itemset can be considered a frequent itemset if at least 50% of the transactions contain this itemset. In other words, the support of a frequent itemset should be greater than or equal to the minimum support.

Question 7

Q

What is the Apriori algorithm method

Answer

A

The Apriori algorithm takes a bottom-up iterative approach to uncovering the frequent itemsets by first determining all the possible items (or 1-itemsets, for example {bread}, {eggs}, {milk}, …) and then identifying which among them are frequent. Assuming the minimum support threshold (or the minimum support criterion) is set at 0.5, the algorithm identifies and retains those itemsets that appear in at least 50% of all transactions and discards the itemsets that have a support less than 0.5 (appear in fewer than 50% of the transactions). It then repeats this with the 2-itemsets

Question 8

Q

In Association Rules what is Support

Answer

A

Support (X => Y) =
( Number of transactions with both X and Y ) /
The total number of transactions

Support is an indication of how frequently the itemset appears in the dataset - this is just the probability of that combination appearing!

Question 9

Q

In Association Rules what is Confidence

Answer

A

Confidence (X => Y) =
( Number of transactions with both X and Y ) /
The total number of transactions containing X

Confidence is an indication of how often the rule has been found to be true

Question 10

Q

In Association Rules what is Lift

Answer

A

Lift (X => Y) =
(Support (X and Y)) /
((Support of X)*(Support of Y))

Lift (X => Y) =
P(X,Y)/((P(X)*P(Y))

Lift indicates how likely Y itemset is to be picked along with itemset X than by itself expressed as a ratio

It is a multiplier of the normal chance

(Support of X) * (Support of Y) is the assumption that if these were entirely independent then the probability of getting this result is the P(X) x P(Y) like dice rolls

Question 11

Q

In Association Rules what is Leverage

Answer

A

Leverage (X => Y) =
(Support (X and Y)) - ((Support of X)*(Support of Y))

Leverage (X=>Y) =
P(X,Y)-(P(X)*P(Y))

Leverage indicates how likely Y itemset is to be picked along with itemset X than by itself expressed as a difference

(Support of X) * (Support of Y) is the assumption that if these were entirely independent then the probability of getting this result is the P(X) x P(Y) like dice rolls

Question 12

Q

Explain the benefit of know Lift

Answer

A

If X occurred independently from Y then Lift = 1. When two events are independent of each other no rule can be drawn involving those two events.

If Lift >1, (greater than 1) that lets us know the degree to which two occurrences are dependent on one anotherand makes those rules potentially useful for predicting the consequent in future data sets.

If Lift >1 (less than 1) suggests a negative association – where purchasing one item reduces the probability of buying the other.

Note that is the lift is zero, then they are exclusive, buying one means not buying the other.

Question 13

Q

The first iteration of the Apriori algorithm does what

Answer

A

Looks at the support of the itemsets which contain only one item, given support is X=>Y/all transations, and we are looking at X in isolation, the first calculation of support is just X/All hence the % of instances. So say the support needs to be at least 2% means that only items that appear in a frequency of at least 2% will be taken to the next level. The rest are “pruned”.

Question 14

Q

What is the syntax in R for applying the Apriori association algorithm

Answer

A

itemsets = apriori ( Groceries, parameter=list (minlen=1, maxlen=1, support=0.02, target=”frequent itemsets”)

Question 15

Q

What happens in the lead into step two of applying the Apriori association algorithm in R

Answer

A

All of the items that survived the first round are now joined into combos ie. 1, 3, 7 were considered frequent (had a great enough support) so combos 13, 17 & 37 will now be assessed.

Question 16

Q

What is the syntax in R for applying the Apriori association algorithm including confidence

Answer

Study These Flashcards

A

itemsets = apriori ( Groceries, parameter=list (support=0.02, confidence=0.6, target=”frequent itemsets”)

Question 17

Q

List the five approaches whcih could be used to improve the Apriori’s efficiency

Answer

Study These Flashcards

A

Partitioning
Sampling
Transaction Reduction
Hash-Based itemset counting
Dynamic itemset counting

Question 18

Q

What does a Lift of 1 mean

Answer

Study These Flashcards

A

If X occurred independently from Y then Lift = 1.

When two events are independent of each other no rule can be drawn involving those two events.

05. Association Rules Flashcards

(18 cards)