MBA Flashcards
Items
objects we are identifying associations between
- group of items : item set
Transactions
instances of groups of items CO-OCCURRING together
Basket Data
collection of transaction IDs and items bought in a transaction
Rules
statements of the form {item1, item2, …} => {itemk}
- implies if an itemset has items on the LHS, visitors will be interested in RHS
Support
of an item: fraction of transactions in our data set that contain that item/item set
of a rule: fraction of transactions containing all items in the LHS and RHS item sets (symmetric)
support (A) = #A / #all
support (A->B) = #A&B / #all
high support / low support
The higher the support the more frequently the item set occurs.
high support is good
- apply to a large number of transactions; a lot of people affected
Confidence
of a rule: probability of transaction that contains the item set on the LHS also contains the item set on the RHS
- proportion of B appearing in transactions that contain A [P(B|A)]
confidence (A->B) = #A&B / #A
high confidence / low confidence
The higher the confidence, the greater the likelihood that the item on RHS will be purchased or, in other words, the greater the return rate you can expect for a given rule.
Association rule
Statement of the form X->Y = “X implies Y”
Lift
ratio by which the confidence of a rule exceeds the expected confidence, assuming that the item sets are independent
lift (A->B) = confidence(A->B) / expected confidence
((expected confidence = support(B) = #B / #all
lift value
lift > 1: positive association (items not independent, complementary)
lift = 1: no association (independent)
lift < 1: negative association (items are not independent, substitutes)
Rule Notation (using confidence and support)
LHS -> RHS [confidence, support]
Association Rule Types
- Actionable rules: high quality, actionable information
- Trivial rules: information already well-known by those familiar with the business
- Inexplicable rules: no explanation and do not suggest action
*trivial and inexplicable occur the most often
Threshold
- Support threshold
- eg. set threshold for minimum support => all item sets equal and above are frequent item sets - Confidence threshold
- eg. set minimum confidence for an identified rule
Apriori principle
if item set is frequent, all of its subsets must also be frequent