UNIT 1 - Overview of Association Analysis and Apriori Algorithm Flashcards

1
Q

What is association analysis?

A

It aims to discover items that co-occur frequently within a database.

can provide a clear interpretation of how inputs are associated with output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is association rule written?

A

X –> Y (rule support, confidence), where X is

known as the antecedent and Y the consequent of the association rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a rule support?

A

Rule support = % of times X and Y appear together

= P(X and Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a confidence?

A

Confidence = likelihood that Y appears when X occurs

= P(Y|X) = P(X and Y)/P(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A small value of confidence suggests a strong association rule. True or False?

A

False. . A high value of confidence suggests a strong association rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an apiori algorithm?

A

Apriori algorithm finds all association rules whose support values confidence values that are greater than or equal to the user-specified minimum support and minimum confidence thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does Apiori work?

A

Phase 1: find all frequent itemsets whose support >
user specified min support (This phase is computationally expensive)

Phase 2: Generate association rules that are > user-specified min confidence (This phase is to select
those rules with high confidence)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are frequent itemset?

A

It is those whose support satisfies the minimum support threshold set by the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is downward closure principle or Apriori Principle?

A

This principle states that all subsets of a frequent itemset must also be frequent.

automatically eliminates all non-frequent itemsets so that their support need not be evaluated at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the limitations of Classical Support-Confidence Framework?

A
  • Not easy to set the support & confidence thresholds
    If min support too low > many meaningless rules
    If min support too high > easily find trivial/obvious associations
    Confidence is heavily affected by the frequency of the antecedent and consequent
  • Not all rules with high support and high confidence are interesting
    High confidence rules may be trivial rules with frequently purchased products in the consequent of the rule.
    Eg. nearly all customers who buy burgers also buy drinks. (the confidence will be high regardless of whether there is a real association)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Posterior confidence?

A

It refers to the confidence of rules with one or more antecedents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is confidence difference?

A

It’s the absolute difference between posterior and prior confidence.

Using this measure, a rule with a high degree of confidence will only be selected if
its prior confidence is low. Moreover, given that the absolute value of the difference is being used, unusual rules may be found by this measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is confidence ratio?

A

The evaluation measure is e = 1 – min(Cposterior/Cprior ,Cprior/ Cposterior). This option can find unusual rules and can be more useful than the absolute confidence difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Apriori node in IBM SPSS modeler provides several measures of interestingness. What are the four measures?

A

Confidence Difference, Confidence Ratio, Information Difference, and Normalized Chi-Square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is information difference?

A

This measure takes into account the support of a rule, so that for the same prior and posterior confidences, a rule will be valued more highly if it applies to a larger number of cases. Thus, it tends to eliminate the rarely relevant rules sometimes found by other algorithms. However, since it is difficult to translate information gain in bits to percentage differences, the use of this evaluation measure typically requires a bit of experimentation to get a useful setting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is normalized Chi-Square?

A

With normalization, the Chisquare measure can assume values between 0 (no relationship) and 1 (perfect relationship). The normalized Chi-square measure is not intuitively related to the differences between the prior and posterior confidences, so some experimentation is required to get useful settings.

17
Q

What are some supermarket marketing strategies?

A
  • Plan shelf space: put beer and diaper together so as to boost sales in both helps in store layout and planning
  • Provide advertisements: to customers who are likely to buy some products (e.g, customer A is likely to buy diapers every two weeks)
  • Bundle products with discount: to increase sales (e.g,
    customer B likes to buy soy milk and wheat biscuit every Sunday)
18
Q

What is the support?

A

Support = % of times X appears = P(X)

19
Q

Limitation of Apriori algorithm?

A
  • need categorical data

- not good with numeric data

20
Q

What are strong rules?

A

Rules that satisfy both a minimum support threshold

and a minimum confidence threshold