MBA Flashcards

Question 1

Q

Items

Answer

A

objects we are identifying associations between
- group of items : item set

Question 2

Q

Transactions

Answer

A

instances of groups of items CO-OCCURRING together

Question 3

Q

Basket Data

Answer

A

collection of transaction IDs and items bought in a transaction

Question 4

Q

Rules

Answer

A

statements of the form {item1, item2, …} => {itemk}
- implies if an itemset has items on the LHS, visitors will be interested in RHS

Question 5

Q

Support

Answer

A

of an item: fraction of transactions in our data set that contain that item/item set

of a rule: fraction of transactions containing all items in the LHS and RHS item sets (symmetric)

support (A) = #A / #all
support (A->B) = #A&B / #all

Question 6

Q

high support / low support

Answer

A

The higher the support the more frequently the item set occurs.
high support is good
- apply to a large number of transactions; a lot of people affected

Question 7

Q

Confidence

Answer

A

of a rule: probability of transaction that contains the item set on the LHS also contains the item set on the RHS
- proportion of B appearing in transactions that contain A [P(B|A)]

confidence (A->B) = #A&B / #A

Question 8

Q

high confidence / low confidence

Answer

A

The higher the confidence, the greater the likelihood that the item on RHS will be purchased or, in other words, the greater the return rate you can expect for a given rule.

Question 9

Q

Association rule

Answer

A

Statement of the form X->Y = “X implies Y”

Question 10

Q

Lift

Answer

A

ratio by which the confidence of a rule exceeds the expected confidence, assuming that the item sets are independent

lift (A->B) = confidence(A->B) / expected confidence

((expected confidence = support(B) = #B / #all

Question 11

Q

lift value

Answer

A

lift > 1: positive association (items not independent, complementary)
lift = 1: no association (independent)
lift < 1: negative association (items are not independent, substitutes)

Question 12

Q

Rule Notation (using confidence and support)

Answer

A

LHS -> RHS [confidence, support]

Question 13

Q

Association Rule Types

Answer

A

Actionable rules: high quality, actionable information
Trivial rules: information already well-known by those familiar with the business
Inexplicable rules: no explanation and do not suggest action

*trivial and inexplicable occur the most often

Question 14

Q

Threshold

Answer

A

Support threshold
- eg. set threshold for minimum support => all item sets equal and above are frequent item sets
Confidence threshold
- eg. set minimum confidence for an identified rule

Question 15

Q

Apriori principle

Answer

A

if item set is frequent, all of its subsets must also be frequent

Question 16

Q

Apriori Algorithm

Answer

A

starts from single items
progressively identifies larger items subsets of different sizes
exploits Apriori Principle that any subset of a frequent item set is also a frequent item set
any superset of an infrequent item is also infrequent

example: Algorithm to generate frequent item sets
1. Start with all item sets with a single item and compute their support (or count)
2. Remove the ones that do not have minimum support
3. Generate all two-item item sets using the results retained in the previous step and compute their support
4. Remove the ones that do not have minimum support
5. Continue till you have item sets of all sizes (or desired max size) with the required minimum support (or count)

Question 17

Q

Apriori Parameters

Answer

A

Examples:
- maximum number of items to process
- support threshold
- confidence threshold
- maximum size of item set
- maximum number of rules to keep