UNIT 1 - Overview of Association Analysis and Apriori Algorithm Flashcards
What is association analysis?
It aims to discover items that co-occur frequently within a database.
can provide a clear interpretation of how inputs are associated with output
How is association rule written?
X –> Y (rule support, confidence), where X is
known as the antecedent and Y the consequent of the association rule
What is a rule support?
Rule support = % of times X and Y appear together
= P(X and Y)
What is a confidence?
Confidence = likelihood that Y appears when X occurs
= P(Y|X) = P(X and Y)/P(X)
A small value of confidence suggests a strong association rule. True or False?
False. . A high value of confidence suggests a strong association rule.
What is an apiori algorithm?
Apriori algorithm finds all association rules whose support values confidence values that are greater than or equal to the user-specified minimum support and minimum confidence thresholds.
How does Apiori work?
Phase 1: find all frequent itemsets whose support >
user specified min support (This phase is computationally expensive)
Phase 2: Generate association rules that are > user-specified min confidence (This phase is to select
those rules with high confidence)
What are frequent itemset?
It is those whose support satisfies the minimum support threshold set by the user
What is downward closure principle or Apriori Principle?
This principle states that all subsets of a frequent itemset must also be frequent.
automatically eliminates all non-frequent itemsets so that their support need not be evaluated at all
What are the limitations of Classical Support-Confidence Framework?
- Not easy to set the support & confidence thresholds
If min support too low > many meaningless rules
If min support too high > easily find trivial/obvious associations
Confidence is heavily affected by the frequency of the antecedent and consequent - Not all rules with high support and high confidence are interesting
High confidence rules may be trivial rules with frequently purchased products in the consequent of the rule.
Eg. nearly all customers who buy burgers also buy drinks. (the confidence will be high regardless of whether there is a real association)
What is Posterior confidence?
It refers to the confidence of rules with one or more antecedents
What is confidence difference?
It’s the absolute difference between posterior and prior confidence.
Using this measure, a rule with a high degree of confidence will only be selected if
its prior confidence is low. Moreover, given that the absolute value of the difference is being used, unusual rules may be found by this measure.
What is confidence ratio?
The evaluation measure is e = 1 – min(Cposterior/Cprior ,Cprior/ Cposterior). This option can find unusual rules and can be more useful than the absolute confidence difference
Apriori node in IBM SPSS modeler provides several measures of interestingness. What are the four measures?
Confidence Difference, Confidence Ratio, Information Difference, and Normalized Chi-Square
What is information difference?
This measure takes into account the support of a rule, so that for the same prior and posterior confidences, a rule will be valued more highly if it applies to a larger number of cases. Thus, it tends to eliminate the rarely relevant rules sometimes found by other algorithms. However, since it is difficult to translate information gain in bits to percentage differences, the use of this evaluation measure typically requires a bit of experimentation to get a useful setting.