Week 2 Flashcards

1
Q

What is a “pattern”?

A

What is a “pattern”? A structure of attributes that represents the intrinsic and important properties of data objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a pattern in itemset data?

A

What is a pattern in itemset data? A frequent subset of items An association rule A correlation of two items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a subset defined?

A

How is a subset defined? Two itemsets X1 and X2 If every item in X1 is also in X2, the X1 is a subset of X2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is a superset defined?

A

How is a superset defined? Two itemsets X1 and X2 If every item in X1 is also in X2 (X1 is in X2), then X1 is a subset of X2; Therefore X2 is a superset of X1 (X2 contains X1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Frequent Itemset?

A

What is a Frequent Itemset? A k-itemset: X = {x1, x2, …, xk} Itemset X contains exactly k items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is Support defined?

A

How is Support defined? Support is the frequency of X in a dataset (database).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two types of Support?

A

What are the two types of Support? Absolute Support: the number of transactions that contain X. Relative Support: the fraction of transactions that contain X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formal, mathematical definition of Support?

A

What is the formal, mathematical definition of Support? An itemset if frequent if its support sup(X) is no less than a threshold of min_sup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Support examples:

A

Support examples: if {a} appears in 4 of 5 items of an itemset, then the level of support is 0.8 or 80% if {a, b} appears in 3 of 5 items of an itemset, then the level of support is 0.6 or 60% if {a, b, c} appears in 2 of 5 items of an itemset, then the level of support is 0.4 or 40% if min_sup = 0.5 then examples 1 and 2 above are frequent whereas 3 is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an Association rule?

A

What is an Association rule? X -> Y The Association rule is a measure of association between two itemsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is support defined for an Association?

A

How is support defined for an Association? X -> Y Support is the (marginal) probability that a transaction contains both X and Y. P( X and Y) Confidence is the conditional probability that a transaction which contains X also contains Y. P(Y | X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Association examples:

A

Association examples: {a} = 0.8 sup {b} = 0.8 sup {a, b} = 0.6 sup {a} -> {b}: Of the items that contain {a} how many also contain {b} (3 of 4 for example) confidence = 0.75 [0.6, 0.75] Book recommendations on Amazon are a perfect example of support and confidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we find frequent itemsets?

A

How do we find frequent itemsets? 1. Scan every transaction in the database. 2. Enumerate the possible subsets. 3. Check whether their frequency is above the minimal support value. 4. For frequent itemsets calculate the confidence of associations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Downward Closure Property?

A

What is the “Downward Closure Property?” A: any subset of a frequent itemset must be frequent. if {a, b, c} is frequent then {a, b} must also be frequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Apriori Pruning Principle?

A

What is the Apriori Pruning Principle (APP)? (Apriori: candidate generation and test) A: If any itemset is infrequent, none of its supersets need to be considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the Apriori Algorithm.

A

Describe the Apriori Algorithm. 1. Scan the database once to get frequent 1-itemsets. 2. Generate size k+1 candidate itemsets from length k frequent itemsets 3. Test (count) the candidate against the database. 4. Terminate when no frequent or candidate itemset can be generated.

17
Q

Apriori Algorithm Example Step 1

A

Apriori Algorithm Example Step 1 Items: {b,m,w}, {b,s,m,l}, {m,b,l}, {m,s}, {r,b,l} Minimum Support = 2 Data: {b}=4, {m}=4, {l}=3, {s}=2, {w}=1, {r}=1 Frequent 1-itemsets that meet the min_support: {b}=4, {m}=4, {l}=3, {s}=2

18
Q

Apriori Algorithm Example Step 2

A

Apriori Algorithm Example Step 2 Items: {b,m,w}, {b,s,m,l}, {m,b,l}, {m,s}, {r,b,l} Generate Candidate 2-itemsets (self-join) (nCr) {b, m}, {b, l}, {b, s}, {m, s}, {m, l}, {s, l} Rescan database for occurrences of 2-itemsets {b, m}=3, {b, s}=1, {b, l}=3, {m, s}=2, {m, l}=2, {s, l}=1 Only 4 of the 6 itemsets meet the minimum support.

19
Q

Apriori Algorithm Example Step 3

A

Apriori Algorithm Example Step 3 Items: {b,m,w}, {b,s,m,l}, {m,b,l}, {m,s}, {r,b,l} {b, m}=3, {b, l}=3, {m, s}=2, {m, l}=2 Generate Candidate 3-itemsets (self-join) (nCr) {b,m,l}, {b,m,s}, {m,s,l} Rescan database for occurrences of 3-itemsets {b,m,l}=2, {b,m,s}=1, {m,s,l}=1 One 3-itemset, the algorithm will terminate

20
Q

What are some of the challenges to the Apriori Algorithm?

A

What are some of the challenges to the Apriori Algorithm? -Needs multiple scans of the database -Huge number of candidates (most are not) -Tedious workload of counting the frequency of every candidate

21
Q

What are some of the improvements being attempted to improve the Apriori algorithm?

A

What are some of the improvements being attempted to improve the Apriori algorithm? -Reduce the number of passes on the database -Shrink the number of candidates -Sampling and approximation -Distributed counting (e.g. Map-Reduce) -See Textbook Chapters 6, 7

22
Q

What is a Closed pattern?

A

What is a Closed pattern? Closed pattern: An itemset X is closed if X is frequent and there exists no super-pattern with the same support as X. Every Max pattern is Closed but not all Closed patterns are Max.

23
Q

What is a Max pattern?

A

What is a Max pattern? Max pattern: An itemset X is a max pattern if X is frequent and there exists no super-pattern that is also frequent. Every Max pattern is Closed but not all Closed patterns are Max.

24
Q

What is the benefit of finding Closed and Max patterns?

A

What is the benefit of finding Closed and Max patterns? By searching for max and closed patterns, many shorter patterns can be eliminated as they are just subsets of longer frequent patterns.

25
Q

What is the numeric definition of Jaccard Similarity?

A

What is the numeric definition of Jaccard Similarity? J(A, B) = (number of items in the intersection of two sets) / (number of items in the union of two sets)

26
Q

What is Similarity?

A

What is Similarity? Similarity is a measure of how much two data objects are alike.

27
Q

How does Distance differ from Similarity?

A

How does Distance differ from Similarity? Distance measures the opposite; the greater the distance the greater the dissimilarity.

28
Q

When comparing two sets, how is the interesting measure of Lift calculated?

A

When comparing two sets, how is the interesting measure of Lift calculated?

P(X, Y) = joint probability

Lift = P(Y|X) / P(Y) = P(X, Y) / P(X)P(Y)

29
Q

When comparing two sets, how is Expected Count calculated?

A

When comparing two sets, how is Expected Count calculated? EC = (Row Total) * (Column Total) / (Sample Size)

30
Q

When comparing two sets, how is Chi2 calculated?

A

When comparing two sets, how is Chi2 calculated?

Chi2 = sum( (observed-expected)2 / expected )

31
Q

What does Mutual Information inform?

A

What does Mutual Information inform?

Mutual information measures the mutual dependence between two random variables (X, Y)

32
Q

The amount of information obtained about X through observing Y comes from what field of study?

A

The amount of information obtained about X through observing Y comes from what field of study? This is one the of the classical concepts in Information Theory.

33
Q

What do Chi2 and Mutual information not inform about the correlation of variables?

A

What do Chi2 and Mutual information not inform about the correlation of variables? The direction of the correlation.