UNIT 1 - Overview of Association Analysis and Apriori Algorithm Flashcards

Question 1

Q

What is association analysis?

Answer

A

It aims to discover items that co-occur frequently within a database.

can provide a clear interpretation of how inputs are associated with output

Question 2

Q

How is association rule written?

Answer

A

X –> Y (rule support, confidence), where X is

known as the antecedent and Y the consequent of the association rule

Question 3

Q

What is a rule support?

Answer

A

Rule support = % of times X and Y appear together

= P(X and Y)

Question 4

Q

What is a confidence?

Answer

A

Confidence = likelihood that Y appears when X occurs

= P(Y|X) = P(X and Y)/P(X)

Question 5

Q

A small value of confidence suggests a strong association rule. True or False?

Answer

A

False. . A high value of confidence suggests a strong association rule.

Question 6

Q

What is an apiori algorithm?

Answer

A

Apriori algorithm finds all association rules whose support values confidence values that are greater than or equal to the user-specified minimum support and minimum confidence thresholds.

Question 7

Q

How does Apiori work?

Answer

A

Phase 1: find all frequent itemsets whose support >
user specified min support (This phase is computationally expensive)

Phase 2: Generate association rules that are > user-specified min confidence (This phase is to select
those rules with high confidence)

Question 8

Q

What are frequent itemset?

Answer

A

It is those whose support satisfies the minimum support threshold set by the user

Question 9

Q

What is downward closure principle or Apriori Principle?

Answer

A

This principle states that all subsets of a frequent itemset must also be frequent.

automatically eliminates all non-frequent itemsets so that their support need not be evaluated at all

Question 10

Q

What are the limitations of Classical Support-Confidence Framework?

Answer

A

Not easy to set the support & confidence thresholds
If min support too low > many meaningless rules
If min support too high > easily find trivial/obvious associations
Confidence is heavily affected by the frequency of the antecedent and consequent
Not all rules with high support and high confidence are interesting
High confidence rules may be trivial rules with frequently purchased products in the consequent of the rule.
Eg. nearly all customers who buy burgers also buy drinks. (the confidence will be high regardless of whether there is a real association)

Question 11

Q

What is Posterior confidence?

Answer

A

It refers to the confidence of rules with one or more antecedents

Question 12

Q

What is confidence difference?

Answer

A

It’s the absolute difference between posterior and prior confidence.

Using this measure, a rule with a high degree of confidence will only be selected if
its prior confidence is low. Moreover, given that the absolute value of the difference is being used, unusual rules may be found by this measure.

Question 13

Q

What is confidence ratio?

Answer

A

The evaluation measure is e = 1 – min(Cposterior/Cprior ,Cprior/ Cposterior). This option can find unusual rules and can be more useful than the absolute confidence difference

Question 14

Q

Apriori node in IBM SPSS modeler provides several measures of interestingness. What are the four measures?

Answer

A

Confidence Difference, Confidence Ratio, Information Difference, and Normalized Chi-Square

Question 15

Q

What is information difference?

Answer

A

This measure takes into account the support of a rule, so that for the same prior and posterior confidences, a rule will be valued more highly if it applies to a larger number of cases. Thus, it tends to eliminate the rarely relevant rules sometimes found by other algorithms. However, since it is difficult to translate information gain in bits to percentage differences, the use of this evaluation measure typically requires a bit of experimentation to get a useful setting.

Question 16

Q

What is normalized Chi-Square?

Answer

Study These Flashcards

A

With normalization, the Chisquare measure can assume values between 0 (no relationship) and 1 (perfect relationship). The normalized Chi-square measure is not intuitively related to the differences between the prior and posterior confidences, so some experimentation is required to get useful settings.

Question 17

Q

What are some supermarket marketing strategies?

Answer

Study These Flashcards

A

Plan shelf space: put beer and diaper together so as to boost sales in both helps in store layout and planning
Provide advertisements: to customers who are likely to buy some products (e.g, customer A is likely to buy diapers every two weeks)
Bundle products with discount: to increase sales (e.g,
customer B likes to buy soy milk and wheat biscuit every Sunday)

Question 18

Q

What is the support?

Answer

Study These Flashcards

A

Support = % of times X appears = P(X)

Question 19

Q

Limitation of Apriori algorithm?

Answer

Study These Flashcards

A

need categorical data

- not good with numeric data

Question 20

Q

What are strong rules?

Answer

Study These Flashcards

A

Rules that satisfy both a minimum support threshold

and a minimum confidence threshold

UNIT 1 - Overview of Association Analysis and Apriori Algorithm Flashcards

(20 cards)