Chapter 3 Flashcards
What is association rule mining?
An unsupervised data mining technique that finds interesting associations and/or correlation relationships among a large set of data items.
What do association rules show?
Attribute-value conditions that occur frequently together in a given dataset
What is a use of association rule mining?
Market basket analysis
What is a transaction?
A single vist
Each transaction is associated with a purchase date and items purchased.
What does basket data refer to?
Transaction data
What does the market-basket problem assume?
We have some large number of items (eg all items available in store). Customers fill their baskets with some subset of the items and we get to know what items people buy together, even if we don’t know who they are.
What can retail organisations gain by analysing basket data?
They can extract information to drive their market strategy. They would would want to put effort into items that are frequently purchased.
- Targeted marketing
- Plan store layouts - use the information to position items and control the way a typical customer traverses the store. May place commonly purchased together items in close proximity to encourage the sale of such items together, or may place them on opposite ends of the store to entice customers to pick up other items on the way.
- Help retailers plan which items to put on sale at reduced prices
What is a boolean vector in market basket analysis?
Each basket can be replaced by a Boolean vector if we considered that each item has a Boolean variable representing the presence or absence of the item.
The boolean vector can be analysed for buying patterns that reflect items that are frequently associated or purchased together. These patterns can be represented in the form of association rules.
Aside from the marketing application, where else can Association rules be used?
- Baskets = documents, items = words
Words appearing frequently together in documents may represent phrases or linked concepts. Can be used for intelligence gathering, predictive text etc. - Baskets = documents, items = sentences
Two documents with many of the same sentences could represent plagiarism or mirror sites on the web
What form do association rules take?
Association rules are statements of the form
{X1, X2, …, Xn} => Y
Meaning if we find all of X1, X2, …, Xn in the market basket, then we have. good chance of finding Y.
What is the confidence of the association rule and how can it be determined?
The strength of implication in the rule
Determine the probability of finding Y given X1, X2, …, Xn occur.
Which rules are we interested in?
Rules with a confidence above a certain threshold.
We may also ask that the confidence be significantly higher than it would be if items were placed at random into baskets.
eg we may find a rule like {milk, butter} => bread - simply because a lot of people buy bread
What are the properties of market-basket analysis?
- Association rules
- Causality
- Frequent itemsets
Why do we want to consider causality?
Correlation does not equal causation. From the marketing strategy perspective, it is important to understand where the causation is coming from.
We want to know that in an association rule, the presence of X1, X2, …, Xn actually “causes” Y to be bought. eg diapers and beer
Why do we want to consider frequent itemsets?
In most situations, we only care about association rules or causalities involving sets of items that appear frequently in baskets.
We can’t run a good marketing strategy involving items that no one buys.
Data mining starts with the assumption that we only care about sets of items with high support ie they appear together in many baskets. Sets of items must appear in at least a certain percent of the baskets called the support support threshold.
What is the support threshold?
A certain percentage of baskets that an item must appear in to be considered.
What is the symbolised version of “transactions can be considered to be a subset of the set of all possible items”?
T ⊆ I
What is the formula for the support s?
[See flashcard]
What is the formula for the confidence c?
[See flashcard]
What does a higher confidence of the association rule A => B mean?
The greater the probability that if a customer buys product A, they will also buy product B.
What way do we write support and confidence values?
To occur between 0% and 100% (rather than 0 and 1)