Association rules Flashcards
what is the goal of association rules in data mining?
To identify item clusters or dependencies in transaction-type databases
What are the two stages of association rule discovery?
Rule generation and rule assessment.
What are the two main measures of rule strength?
Confidence and lift ratio.
What is the name of the popular algorithm for generating frequent itemsets?
Apriori algorithm
What is the goal of “market basket analysis” in the context of association rules?
The goal is discovering which groups of products tend to be purchased together
These items can then be displayed together, offered in post-transaction coupons, or recommended in online shopping
What type of learning methods is the association rules built on?
Unsupervised learning methods
What is the two-stage process involved in association rule discovery?
Rule generation and then assessment of rule strength
In the context of association rules, what is assumed about the data type?
Assume all data are categorical.
Why is association rule analysis also referred to as market basket analysis?
Because it originated with the study of customer transactions databases to determine dependencies between purchases of different items.
Which algorithm is mentioned for rule generation in association rule discovery?
Apriori algorithm
What is another name for association rules, emphasizing the study of “what goes with what”?
affinity analysis
What has the availability of detailed customer transaction information led to?
Development of techniques that automatically look for associations between items that are stored in the database
Data collected using bar-code scanners in supermarkets
Provide an example of data collection for market basket databases mentioned in the text.
data collected using bar-code scanners in
supermarkets
What information are managers interested in when analyzing customer transactions?
Managers are interested to know if certain groups of items are consistently purchased together
Why is handling customer transactions in stores, like supermarkets, considered a big data problem?
Stores like supermarkets handle a very large number of transactions, and carry a lot of different products, and in each transaction a fairly large number of items can be bought
Name some decisions that managers can make based on information about consistently purchased item groups.
making decisions on store layouts and item placement, for cross-selling, for promotions, for catalog design, and for identifying customer segments based on buying patterns
If a store sells 20 items, how many possible combinations exist for associations between just 2 items?
There are 190 possible combinations.
20 C 2 = 190
What is the potential number of associations when considering all possible associations (not just two-way) among 20 items?
The number of possible
associations is greater than million.
In which industry are association rules heavily used to learn about items purchased together?
In retail for learning about items that are purchased together
Apart from retail, where else are association rules commonly encountered?
online recommendation systems
where customers examining an item or items for possible purchase
are shown other items that are often purchased in conjunction with the first item
Give an example of the application of association rules in Amazon.com’s online shopping system.
“Frequently bought together.”
How are association rules applied in online recommendation systems?
customers examining an item or items for possible purchase are shown other items that are often purchased in conjunction with the first item
Provide an example of a scenario in which a medical researcher might use association rules.
medical researcher might want to learn what symptoms appear together
In the context of law, what might the frequent appearance of certain word combinations indicate?
word combinations that appear too often might indicate plagiarism
What type of information do association rules provide, and in what form are they presented?
Association rules provide information of this type in
the form of “if–then” statements
What does the “IF” part represent in association rules?
“IF” part = antecedent
How do association rules differ from traditional if–then rules of logic?
These rules are computed from the data; unlike the if–then rules of logic, association rules are probabilistic in nature
What does the “THEN” part represent in association rules?
“THEN” part = consequent
How are the antecedent and consequent related in association rules?
Antecedent and consequent are disjoint (i.e., have no items in common)
What is the promotional offer mentioned in Example 1 related to phone faceplates?
Customers who purchase multiple faceplates from a choice of six different colors get a discount.
In the context of association rules, what do the terms “antecedent” and “consequent” refer to?
We use the term antecedent to describe the IF part, and consequent to describe the THEN part
In association analysis, the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).
What is an itemset in the context of association rules, and how is it different from records of what people buy?
itemsets are not records of what people buy; they are simply possible combinations of items, including single items.
What is the first step in association rules, and what does it involve in terms of item combinations?
The first step in association rules is to generate all the rules that would be can- didates for indicating associations between items
Ideally, we might want to look at all possible combinations of items in a database with p distinct items
Why is considering all possible combinations of items in a database often impractical?
Generating all these combinations requires a long computation
What practical solution is mentioned for generating association rules, given the exponential growth in computation time?
Consider only combinations that occur with higher frequency in the database.
These are called frequent itemsets.
What is the concept of “frequent itemsets” and how is it related to the determination of valid rules?
Determining what qualifies as a frequent itemset is related to the concept of support.
How is the support of a rule defined, and what does it measure in the context of association rules?
The support of a rule is simply the number of transactions that include both the antecedent and consequent itemsets
How is support sometimes expressed, and what does it indicate in the phone faceplate example?
The support is sometimes expressed as a percentage of the total number of records in the database. For example, the support for the itemset {red, white} in the phone faceplate example is 4 (or, 100x = 40%).
What is the significance of a frequent itemset in association rules, and how is it defined by the user?
What constitutes a frequent itemset is therefore defined as an itemset that has a support that exceeds a selected minimum support, determined by the user
Support=
(or percent) of transactions that include both the antecedent and the consequent in the current database
An association rule is valid if it satisfies some evaluation measures.