MBA Flashcards

1
Q

Items

A

objects we are identifying associations between
- group of items : item set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Transactions

A

instances of groups of items CO-OCCURRING together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Basket Data

A

collection of transaction IDs and items bought in a transaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rules

A

statements of the form {item1, item2, …} => {itemk}
- implies if an itemset has items on the LHS, visitors will be interested in RHS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Support

A

of an item: fraction of transactions in our data set that contain that item/item set

of a rule: fraction of transactions containing all items in the LHS and RHS item sets (symmetric)

support (A) = #A / #all
support (A->B) = #A&B / #all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

high support / low support

A

The higher the support the more frequently the item set occurs.
high support is good
- apply to a large number of transactions; a lot of people affected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence

A

of a rule: probability of transaction that contains the item set on the LHS also contains the item set on the RHS
- proportion of B appearing in transactions that contain A [P(B|A)]

confidence (A->B) = #A&B / #A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

high confidence / low confidence

A

The higher the confidence, the greater the likelihood that the item on RHS will be purchased or, in other words, the greater the return rate you can expect for a given rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Association rule

A

Statement of the form X->Y = “X implies Y”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lift

A

ratio by which the confidence of a rule exceeds the expected confidence, assuming that the item sets are independent

lift (A->B) = confidence(A->B) / expected confidence

((expected confidence = support(B) = #B / #all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

lift value

A

lift > 1: positive association (items not independent, complementary)
lift = 1: no association (independent)
lift < 1: negative association (items are not independent, substitutes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Rule Notation (using confidence and support)

A

LHS -> RHS [confidence, support]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Association Rule Types

A
  1. Actionable rules: high quality, actionable information
  2. Trivial rules: information already well-known by those familiar with the business
  3. Inexplicable rules: no explanation and do not suggest action

*trivial and inexplicable occur the most often

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Threshold

A
  1. Support threshold
    - eg. set threshold for minimum support => all item sets equal and above are frequent item sets
  2. Confidence threshold
    - eg. set minimum confidence for an identified rule
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Apriori principle

A

if item set is frequent, all of its subsets must also be frequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Apriori Algorithm

A
  • starts from single items
  • progressively identifies larger items subsets of different sizes
  • exploits Apriori Principle that any subset of a frequent item set is also a frequent item set
  • any superset of an infrequent item is also infrequent

example: Algorithm to generate frequent item sets
1. Start with all item sets with a single item and compute their support (or count)
2. Remove the ones that do not have minimum support
3. Generate all two-item item sets using the results retained in the previous step and compute their support
4. Remove the ones that do not have minimum support
5. Continue till you have item sets of all sizes (or desired max size) with the required minimum support (or count)

17
Q

Apriori Parameters

A

Examples:
- maximum number of items to process
- support threshold
- confidence threshold
- maximum size of item set
- maximum number of rules to keep