Association Rules and Apriori Flashcards
What are frequent patterns in data commonly referred to as?
Frequent patterns are also known as association rules.
What are the two types of analysis done using frequent patterns?
- Frequent itemsets which leads to the discovery of associations and correlations among items. 2. Frequent subsequences which allows the discovery of patterns across time or positions in a dataset.
What is Market Basket Analysis?
Market Basket Analysis is the type of analysis that identifies sets of items that appear together in transactional datasets such as which wines are sold together with which dish in a restaurant.
What is the Apriori Algorithm used for?
The Apriori Algorithm is used to identify frequent itemsets in a dataset.
Name two other frequently used algorithms aside from Apriori.
- FPGrowth 2. Eclat.
What is the objective of frequent substructures?
The objective of frequent substructures is to find interesting subgraphs in data which can be combinations of frequent itemsets and frequent subsequences.
Define frequent subsequences in the context of datasets.
Frequent subsequences refer to the discovery of patterns across time or positions in a dataset such as the sequential order of purchasing history.
What are some algorithms used for exploring sequences?
- GSP 2. Spade 3. PrefixScan.
What issues can arise with association rules?
Problems include redundancy too many rules making it difficult to find interesting patterns and too few rules if minimum support or confidence thresholds are too high.
What are actionable rules in association rules context?
Actionable rules contain high-quality actionable information that can lead to insights and actions.
What is the definition of support in association rules?
Support is the fraction of rules that occur in all observations.
What does confidence measure in the context of association rules?
Confidence is the probability of a rule being correct for a new observation.
How is lift defined in association rules?
Lift is the ratio by which the confidence of a rule exceeds the expected confidence indicating how much more likely the right-hand side occurs when the left-hand side is true.
What is required for data to find frequent subsequences?
- A timestamps or sequencing information to determine when transactions occurred relative to each other. 2. Identifying information such as customer ID to know which transactions belong to the same entity.
How are rules generated from frequent itemsets?
Rules are generated by calculating confidence and removing itemsets that do not meet the parameter criteria.