Week 2 Flashcards
What is a “pattern”?
What is a “pattern”? A structure of attributes that represents the intrinsic and important properties of data objects.
What is a pattern in itemset data?
What is a pattern in itemset data? A frequent subset of items An association rule A correlation of two items
How is a subset defined?
How is a subset defined? Two itemsets X1 and X2 If every item in X1 is also in X2, the X1 is a subset of X2.
How is a superset defined?
How is a superset defined? Two itemsets X1 and X2 If every item in X1 is also in X2 (X1 is in X2), then X1 is a subset of X2; Therefore X2 is a superset of X1 (X2 contains X1).
What is a Frequent Itemset?
What is a Frequent Itemset? A k-itemset: X = {x1, x2, …, xk} Itemset X contains exactly k items.
How is Support defined?
How is Support defined? Support is the frequency of X in a dataset (database).
What are the two types of Support?
What are the two types of Support? Absolute Support: the number of transactions that contain X. Relative Support: the fraction of transactions that contain X.
What is the formal, mathematical definition of Support?
What is the formal, mathematical definition of Support? An itemset if frequent if its support sup(X) is no less than a threshold of min_sup.
Support examples:
Support examples: if {a} appears in 4 of 5 items of an itemset, then the level of support is 0.8 or 80% if {a, b} appears in 3 of 5 items of an itemset, then the level of support is 0.6 or 60% if {a, b, c} appears in 2 of 5 items of an itemset, then the level of support is 0.4 or 40% if min_sup = 0.5 then examples 1 and 2 above are frequent whereas 3 is not.
What is an Association rule?
What is an Association rule? X -> Y The Association rule is a measure of association between two itemsets.
How is support defined for an Association?
How is support defined for an Association? X -> Y Support is the (marginal) probability that a transaction contains both X and Y. P( X and Y) Confidence is the conditional probability that a transaction which contains X also contains Y. P(Y | X)
Association examples:
Association examples: {a} = 0.8 sup {b} = 0.8 sup {a, b} = 0.6 sup {a} -> {b}: Of the items that contain {a} how many also contain {b} (3 of 4 for example) confidence = 0.75 [0.6, 0.75] Book recommendations on Amazon are a perfect example of support and confidence.
How do we find frequent itemsets?
How do we find frequent itemsets? 1. Scan every transaction in the database. 2. Enumerate the possible subsets. 3. Check whether their frequency is above the minimal support value. 4. For frequent itemsets calculate the confidence of associations.
What is the Downward Closure Property?
What is the “Downward Closure Property?” A: any subset of a frequent itemset must be frequent. if {a, b, c} is frequent then {a, b} must also be frequent.
What is the Apriori Pruning Principle?
What is the Apriori Pruning Principle (APP)? (Apriori: candidate generation and test) A: If any itemset is infrequent, none of its supersets need to be considered.