Association FINAL Flashcards

1
Q

Association Rules interested in

A

Observing which objects occur together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Association rules recommending or co-occur?

A

Seeing which items co-occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Association Rule Mining

A

Given a set of transactions, find the rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does implications mean casuality?

A

No, means co-occurrence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

{} -> {}

A

Antecedent -> Consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 types of database

A

Binary, Transaction, Vertical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Items

A

I = {x1, x2, …, xm}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A set X within the set of items

A

Itemset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

An itemset of cardinality k

A

k-itemset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

I^(k)

A

set of all k-itemsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transaction identifiers, tids

A

T = {t1, t2, …, tn}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

t within T

A

tidset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Transaction

A

Tuple in the form (t, X) where t is a unique transaction identifier and X is an itemset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Support

A

The support of an itemset X in a dataset D denoted sup(X, D) is the number of transactions in D that contain X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Relative Support

A

The relative support of X is the fraction of transactions that contain X: sup(X,D)|D|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

We use F to

A

denote the set of all itemsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

We use F^(k)

A

to denote the set of k-itemsets

18
Q

Itemset mining problem

A

Given a minimum support threshold (minsup), find all itemsets X s.t. sup(x) >= minsup

19
Q

Frequent itemsets

A

An itemset X is frequent if sup(x) >= minsup where minsup is a user specified minimum support threshold (if minsup is fraction, then relative support is implied)

20
Q

Total possible subset

A

2^|I|

21
Q

Naive approach to generate all itemsets that are frequent

A

For all x in I:
compute support
if support >= minsup
add to list

22
Q

The brute force method

A

Explores the entire itemset search space, regardless of minsup

23
Q

Goal of Association Rule Mining

A

Given a set of transactions T, find all the rules having:
support >= minsup
confidence >= mincond

24
Q

Apriori principle

A

If an itemset is frequent, then all of its subsets must be frequent as well

25
Q

Apriori principle 2

A

If an itemset if infrequent, then all of its supersets must be infrequent as well

26
Q

A rule is frequent if

A

the itemset XY is frequent, sup(XY) >= minsup

27
Q

A rule is strong if

A

conf >= minconf

28
Q

Rules are pruned using

A

confidence

29
Q

confidence (x->y)

A

sup(XY)/sup(x)

30
Q

Unlike support, confidence does not exhibit

A

the monotone property

31
Q

If a rule x -> y\x does not satisfy the confidence threshold, then

A

any rule x’->y\x’, where x’ within X, must not satisfy the confidence threshold as well

32
Q

What happens if misnup is too high

A

we may miss interesting low-support items ex: such items may correspond to expensive products that are rarely purchased by customers, but whose patterns are interesting to mine for the retailer

33
Q

What happens if minsup is too low

A

We get information overload: too many frequent itemsets and too many spurious rules

34
Q

How can some high confidence rules be misleading?

A

High confidence might not imply a meaningful relationship if the consequent is already common in the dataset, irrespective of the antecedent

35
Q

Confidence measure ignores

A

the support of the itemset appearing in the rule consequent

36
Q

which metric accounts for the consequent

A

lift

37
Q

lift

A

conf(x->y)/rsup(y)

38
Q

value of lift close to 1 implies

A

that the support of the rule is expected

39
Q

Good lifts, bad lifts

A

> 1, «1

40
Q
A