Association Flashcards

1
Q

Define support and confidence formulas

A

Support(ABC) = o(ABC) / o(N) Confidence(AB->C) = o(ABC) / o(AB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do you know about (AB->C), (A->BC), (AC->B) and (ABC -> C)

A

Top 3, numerators are all the same. Bottom, doesn’t make sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Apriori Principle

A

If support of a superset is large, the subsets must be large. If a subset is small, the supersets must be small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Anti-monotone principle

This phenom is called what?

Use the books word and describe monotone and anti-monotone.

A

Similar if not identical to apriori, we know if we find all subsets that are “interesting”, we don’t have to worry about supersets being more interesting.

Upward closed.

Monotone: if a subset satisfies a constraint, so does supersets.

Anti-monotone: if a subset violates, so does supersets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many database scans does an association using the Apriori method require?

A

Without apriori, As many as your largest k-item set. With apriori, as many as largest ‘interesting’ k-itemset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 ways to Generate candidate k-itemsets.

Which require pruning?

A
  1. Brute force: Generate every possible k-itemset. Prune.
  2. SQL Join: Join the DB table of k-1 itemsets to itself. Join logic: first two items identical, third item different. Sounds familiar.
  3. Fk-1 x F1 : Combine each candidate k-1 item with each candidate F1 item.Avoid duplicates by ordering each set of sets into lexicographical order.Prune.
  4. Fk-1 x Fk-1: Merge candidate Fk-1 sets only if thier first k-2 items are identical and the last item different.

All methods require pruning because all run the risk of generating too many candidates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An itemset is closed if…

An itemset is maximal if…

A

Closed: X is closed if all supersets have less support.

Closed sets will be saved. Non-closed sets can be discarded for convenience.

Maximal: X is maximal if none of its supeersets are frequent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Given, this list of closed frequent itemsets:

{C} = 0.3 , {A,C} = 0.2

{B,E} = 0.3 , {B,C,E} = 0.2

What do you know about:

E? EC? D? A? B?

A

E = .3

EC = .3

A = B = 0.2

D we don’t know.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What benefit comes from knowing maximal itemsets?

A

From it, you can derive all frequent itemsets.

You don’t know their support, but you know they are “important”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confidence based pruning:

Which of these do we care about?

ABC->D > threshold (pass)

ABC-D < threshhold

C->ABD > threshold

C->ABD < threshold

A

ABC->D under threshold.

We can prune all other sub-sets of ABC.

AB->CD, AC->BD, BC->AD is OUT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to implement Multiple Minimum Supports?

What does this do to your monotonoisity?

A

Minimum support based on 1-itemsets.

MST(Coke) = 3%

MST(Diamonds) = 0.1%

MST(Coke, Diamonds) = min(mst(coke),mst(diamonds)) = 0.1%

Result is no longer anti-monotone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give examples of constraints that are monotone and anti-monotone.

A

Sum and LessThan are anti-monotone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What datatypes are common for attributes?

A

Boolean (binomail)

Discrete (binomial)

Integer like age

Ordinal - ordered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Whats an eager classification method?

What is the other type, give examples of both

A

Eager methods produce the decision tree in advance.

Eager:

Hunt’s recursive decision tree algorithm.

Lazy:

Naive Bayes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name some preprocessing steps

A

Fusing daa from multiple sources

Cleaning: Remove duplicates and noise

Selecting relevant records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Whats post processing

A

Usually data visualization

Verify good data, statistical measures and hypothosys testing

“Ensures only valid and useful results enter Decision Support System

17
Q

Classification Vocab.

Give descriptive terms for atttributes in classification.

A

Target variable, is sometimes called dependent variable.

The other attributes are called explanitory or independent variables.

18
Q

What are two major methods used in cluster analysis

A

K-Means: (sounds similar to genetic programming)

Hierarchical: make like a family tree

Density Method

19
Q

What is usually larger? Support ratio or Confidence Ratio?

A

Confidence always.

Support = o(ab)/Total

Confidence = o(ab)/o(a)

20
Q

Name some places where anti-monotonicity is handy

A

Obviously, apriori support counting.

Rule generation (if abc->d doesn’t satisfy, then ab->cd won’t).

Candidate generation phase for Constraints Mining

21
Q

Discuss why infrequent itemsets can be important.

A

If X+Y is infrequent, especially if minsup is very low,

That makes it likely that mixed itemsets X-Y or Y-X or -Y-X is interesting.

22
Q

Define lift, and how you screwed up the homework

A

Lift of X->Y is

sup(XY)

sup(X) *sup(Y)

There’s an exttra N in there. sup(XY)and sup(X) both have an N, so they cancel, but sup(Y) has an N.

23
Q

How is suppport and confidence effected as you go down a hirearchy,

say from Computer to Laptop to Dell

For Computer -> Printer

A

Support decreases, there are less dells out there than printers.

Confidence increases, the denominator shrinks.

24
Q

How does decreasing support threshold increase computational complexity?

A

Two ways:

You get more frequent candidates, so more counting.

You get larger itemsets, so more passes over the data.

25
Q

What 3 factors effect association computational complexity?

A

Number of items

Number of transactions

Average Transaction Width

26
Q

3 factors we use to descrbe itemset generation algorithms (and more!)

A

Complete: finds all the itemsets

Sound: finds only interesting itemsets.

Efficient. Avoids generating same itemset twice.