Association rules Flashcards

1
Q

what is the goal of association rules in data mining?

A

To identify item clusters or dependencies in transaction-type databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two stages of association rule discovery?

A

Rule generation and rule assessment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two main measures of rule strength?

A

Confidence and lift ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the name of the popular algorithm for generating frequent itemsets?

A

Apriori algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the goal of “market basket analysis” in the context of association rules?

A

The goal is discovering which groups of products tend to be purchased together

These items can then be displayed together, offered in post-transaction coupons, or recommended in online shopping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of learning methods is the association rules built on?

A

Unsupervised learning methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the two-stage process involved in association rule discovery?

A

Rule generation and then assessment of rule strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the context of association rules, what is assumed about the data type?

A

Assume all data are categorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is association rule analysis also referred to as market basket analysis?

A

Because it originated with the study of customer transactions databases to determine dependencies between purchases of different items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which algorithm is mentioned for rule generation in association rule discovery?

A

Apriori algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is another name for association rules, emphasizing the study of “what goes with what”?

A

affinity analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What has the availability of detailed customer transaction information led to?

A

Development of techniques that automatically look for associations between items that are stored in the database

Data collected using bar-code scanners in supermarkets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Provide an example of data collection for market basket databases mentioned in the text.

A

data collected using bar-code scanners in
supermarkets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What information are managers interested in when analyzing customer transactions?

A

Managers are interested to know if certain groups of items are consistently purchased together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is handling customer transactions in stores, like supermarkets, considered a big data problem?

A

Stores like supermarkets handle a very large number of transactions, and carry a lot of different products, and in each transaction a fairly large number of items can be bought

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name some decisions that managers can make based on information about consistently purchased item groups.

A

making decisions on store layouts and item placement, for cross-selling, for promotions, for catalog design, and for identifying customer segments based on buying patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If a store sells 20 items, how many possible combinations exist for associations between just 2 items?

A

There are 190 possible combinations.
20 C 2 = 190

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the potential number of associations when considering all possible associations (not just two-way) among 20 items?

A

The number of possible
associations is greater than million.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In which industry are association rules heavily used to learn about items purchased together?

A

In retail for learning about items that are purchased together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Apart from retail, where else are association rules commonly encountered?

A

online recommendation systems

where customers examining an item or items for possible purchase
are shown other items that are often purchased in conjunction with the first item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give an example of the application of association rules in Amazon.com’s online shopping system.

A

“Frequently bought together.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are association rules applied in online recommendation systems?

A

customers examining an item or items for possible purchase are shown other items that are often purchased in conjunction with the first item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Provide an example of a scenario in which a medical researcher might use association rules.

A

medical researcher might want to learn what symptoms appear together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In the context of law, what might the frequent appearance of certain word combinations indicate?

A

word combinations that appear too often might indicate plagiarism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of information do association rules provide, and in what form are they presented?

A

Association rules provide information of this type in
the form of “if–then” statements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the “IF” part represent in association rules?

A

“IF” part = antecedent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do association rules differ from traditional if–then rules of logic?

A

These rules are computed from the data; unlike the if–then rules of logic, association rules are probabilistic in nature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the “THEN” part represent in association rules?

A

“THEN” part = consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How are the antecedent and consequent related in association rules?

A

Antecedent and consequent are disjoint (i.e., have no items in common)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the promotional offer mentioned in Example 1 related to phone faceplates?

A

Customers who purchase multiple faceplates from a choice of six different colors get a discount.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In the context of association rules, what do the terms “antecedent” and “consequent” refer to?

A

We use the term antecedent to describe the IF part, and consequent to describe the THEN part

In association analysis, the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an itemset in the context of association rules, and how is it different from records of what people buy?

A

itemsets are not records of what people buy; they are simply possible combinations of items, including single items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the first step in association rules, and what does it involve in terms of item combinations?

A

The first step in association rules is to generate all the rules that would be can- didates for indicating associations between items

Ideally, we might want to look at all possible combinations of items in a database with p distinct items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why is considering all possible combinations of items in a database often impractical?

A

Generating all these combinations requires a long computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What practical solution is mentioned for generating association rules, given the exponential growth in computation time?

A

Consider only combinations that occur with higher frequency in the database.

These are called frequent itemsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the concept of “frequent itemsets” and how is it related to the determination of valid rules?

A

Determining what qualifies as a frequent itemset is related to the concept of support.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How is the support of a rule defined, and what does it measure in the context of association rules?

A

The support of a rule is simply the number of transactions that include both the antecedent and consequent itemsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How is support sometimes expressed, and what does it indicate in the phone faceplate example?

A

The support is sometimes expressed as a percentage of the total number of records in the database. For example, the support for the itemset {red, white} in the phone faceplate example is 4 (or, 100x = 40%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the significance of a frequent itemset in association rules, and how is it defined by the user?

A

What constitutes a frequent itemset is therefore defined as an itemset that has a support that exceeds a selected minimum support, determined by the user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Support=

A

(or percent) of transactions that include both the antecedent and the consequent in the current database

An association rule is valid if it satisfies some evaluation measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the key idea behind the Apriori algorithm?

A

to begin by generating frequent itemsets with just one item (one-itemsets) and to recursively generate frequent itemsets with two items, then with three items, and so on, until we have generated frequent itemsets of all sizes.

32
Q

How does the Apriori algorithm start the process of generating frequent itemsets?

A

Count for each item, how many transactions in the database include the item.T hese transaction counts are the supports for the one-itemsets. We drop one-itemsets that have support below the desired minimum support to create a list of the frequent ope-itemsets.

33
Q

What criterion is applied to one-itemsets to create a list of frequent one-itemsets?

A

We drop one-itemsets that have support below the desired minimum support to create a list of the frequent one-itemsets.

34
Q

How does the Apriori algorithm proceed to generate frequent two-itemsets?

A

To generate frequent two-itemsets, we use the frequent one-itemsets.

35
Q

What is the rationale behind using frequent one-itemsets in generating frequent two-itemsets?

A

The reasoning is that if a certain one-itemset did not exceed the minimum support, any larger size itemset that includes it will not exceed the minimum support.

36
Q

How is the process of generating k-itemsets related to the preceding step in the Apriori algorithm?

A

In general, generating k-itemsets uses the frequent (k-1)-itemsets that were generated in the preceding step.

For example, if {A} and {B} are frequent one-item sets, we combine them to create {A, B} as a frequent two-item set.
This process continues up to k-item sets, where k represents the number of items in the set.

37
Q

What contributes to the speed of the Apriori algorithm, especially when dealing with a large number of unique items in a database?

A

Each step requires a single run through the database, and therefore the Apriori algorithm is very fast even for a large number of unique items in a database.

38
Q

Besides support, what is the additional measure mentioned that expresses uncertainty about the if-then rule?

A

This is known as the confidence of the rule.

39
Q

What is the term used to describe the measure that compares the co-occurrence of antecedent and consequent itemsets?

A

confidence of the rule

40
Q

How is confidence defined in the context of association rules?

A

Confidence is defined as the ratio of the number of transactions that include all antecedent and consequent itemsets (namely, the support) to the number of transactions that include all the antecedent itemsets:

41
Q

What does the numerator of the confidence ratio represent?

A

no. of transactions with both antecedent and consequent itemsets

42
Q

What does the denominator of the confidence ratio represent?

A

no. of transactions with antecedent itemset

43
Q

What is the goal from the abundance of rule generated?

A

The goal is to find only the rules that indicate a strong dependence between the antecedent and consequent itemset

44
Q

To measure the strength of association implied by a
rule, we use the measures of

A

confidence and lift ratio

45
Q

Suppose that a supermarket database has
100,000 point-of-sale transactions. Of these transactions, 2000 include both orange juice and flu medication, and 800 of these include soup purchases.
Get the support & confidence of the association rule “IF orange juice and flu medication are purchased THEN soup is purchased on the same trip”

A

The association rule has a support of 800 transactions (alternatively, 0.8% = 800/100,000) and a confidence of 40% (=800/2000).

46
Q

What does support measure or estimate in the context of association rules?

A

One way to think of support is that it is the (estimated)
probability that a transaction selected randomly from the
database will contain all items in the antecedent and the
consequent

47
Q

How is confidence defined in terms of probability, and what does it represent?

A

Confidence is the (estimated) conditional probability that a trans- action selected randomly will include all the items in the consequent given that the transaction includes all the items in the antecedent:

A high value of confidence suggests a strong association rule

Confidence = P(antecedent AND consequent) / P(antecedent)= P(consequent | antecedent)

48
Q

What does a high value of confidence suggest about an association rule?

A

a strong association rule

49
Q

Why can a high confidence value be deceptive in evaluating association rules?

A

because if the antecedent and/or the consequent has a high level of support, we can have a high value for confidence even when the antecedent and consequent are independent

For example, if nearly all customers buy bananas and nearly all customers buy ice cream, the confidence level of a rule such as “IF bananas THEN ice-cream” will be high regardless of whether there is an association between the items.

50
Q

How is the strength of an association rule assessed using the Lift Ratio?

A

It is to compare the confidence of the rule with a benchmark value, where we assume that the occurrence of the consequent itemset in a transaction is independent of the occurrence of the antecedent for each rule.

51
Q

How is the support calculated under the independence assumption for the Lift Ratio?

A

P (antecedent AND consequent) = P (antecedent) x P (consequent)

52
Q

What independence assumption is made when comparing confidence with a benchmark value?

A

the antecedent and consequent
itemsets are independent

53
Q

How is the benchmark confidence value for a rule computed from the data?

A

P(antecedent)* P(consequent) /P(antecedent)
= P(consequent).

54
Q

What is the ratio used to compare confidence with benchmark confidence, and what is it called?

A

The lift ratio is the confidence of the rule divided by the confidence, assuming independence of consequent from antecedent:

55
Q

How is the lift ratio of a rule calculated?

A

lift ratio =
confidence/benchmark confidence

56
Q

According to the text, what does a lift ratio greater than 1 indicate about the association rule?

A

the level of association between the antecedent and consequent item sets is higher than would be expected if they were independent.

The larger the lift ratio, the greater the strength of the association

57
Q

How is the lift ratio related to the level of association between antecedent and consequent item sets?

A

If Lift Ratio > 1, The level of association between the antecedent and consequent item sets is higher than would be expected if they were independent.

58
Q

How is the lift ratio formula expressed in terms of confidence and support?

A

Lift Ratio= Confidence/ Support

59
Q

Benchmark confidence Formula Derivation

A

Since we are assuming Independence,
P(D|A)= P(antecedent)* P(consequent )/ P(antecedent)
Therefore equals to P(consequent ). Consequently, Benchmark confidence formula equals
no. of transactions with consequent itemset/ no. of transactions in database

60
Q

What is the first stage in the process of rule selection, and what is its goal?

The process of selecting strong rules is based on generating all association rules that meet stipulated support and confidence requirements. This is done in two stages.

A

It consists of finding all “frequent” item- sets, those itemsets that have a requisite support. It is aimed at removing item combinations that are rare in the database

61
Q

Which stage of rule selection related to the Apriori algorithm?

A

For most association analysis data, the computa- tional challenge is the first stage, as described in the discussion of the Apriori algorithm.

62
Q

What is the purpose of finding “frequent” itemsets in the first stage of rule selection?

A

To remove item combinations that are rare in the database.

63
Q

What does the second stage of rule selection focus on, and what is its objective?

A

We gener- ate, from the frequent itemsets, association rules that meet a confidence require- ment.

64
Q

In the computation of confidence in the second stage, what is the relationship between a subset and the set it belongs to?

A

any subset (e.g., {red} in the phone faceplate example) must occur at least as fre- quently as the set it belongs to (e.g., {red, white)), each subset will also be in the list.

65
Q

How is confidence calculated in the second stage of rule selection?

A

The confidence as the ratio of the support for the itemset to the support for each subset of the itemset

66
Q

What criteria are used to retain association rules in the second stage?

A

We retain the corresponding association rule only if it exceeds the desired cutoff value for con- fidence

For example, from the itemset {red, white, green} in the phone faceplate purchases, we get the following single-consequent association rules, confidence values, and lift values

67
Q

In interpreting results, why is it useful to consider the support for a rule?

A

The support for the rule indicates its impact in terms of overall size: How many transactions are affected? If only a small number of transactions are affected, the rule
may be of little use

68
Q

What does the lift ratio indicate about the efficiency of a rule ?

A

The lift ratio indicates how efficient the rule is in finding
consequents, compared to random selection.

69
Q

An efficient rule preferred over an inefficient one. Howver, what must be considered alongside efficiency?

A

we must still consider support

70
Q

Does support influence the desirability of a rule, especially when comparing efficient rules?

A

Yes, a very efficient rule that has very low support may not be as desirable as a less efficient rule with much greater support.

71
Q

What does confidence tell us about a rule, and why is it important for determining the business or operational usefulness of a rule?

A

It tells us at what rate consequents will be found and is useful in determining the business or operational usefulness of a rule

72
Q

In the context of promoting consequents, why might a rule with low confidence be considered less desirable?

A

A rule with low confidence may find consequents at too low a rate to be worth the cost of (say) promoting the consequent in all the transactions that involve the antecedent

73
Q

What is the primary goal of the Apriori algorithm?

A

Find all rules that satisfy the user-specified minimum
support (minsup) and minimum confidence (minconf).

74
Q

What are the two user-specified criteria for finding rules using the Apriori algorithm?

A

Support & Confidence

75
Q

Describe the objective of the first stage of the Apriori algorithm.

A

This stage is aimed at removing item combinations that are rare in the database by finding all “frequent” item
sets using the Apriori Algorithm, those item sets that
have a requisite support.

76
Q

What is the purpose of finding “frequent” item sets in the first stage of the Apriori algorithm?

A

This stage is aimed at removing item combinations that are rare in the database.

77
Q

In the second stage of the Apriori algorithm, what kind of rules are generated from the frequent item sets?

A

Association rules that meet a confidence requirement

i.e. the second stage filters the remaining rules and selects only those
with high confidence.

78
Q

In Step 1 of the Apriori algorithm, what is the initial criterion set by the user?

A

User sets a minimum support criterion

79
Q

What is the first action in generating frequent item sets, according to Step 1?

A

generate list of one-item sets that
meet the support criterion

80
Q

After generating one-item sets, what is the next step in the process, according to Step 1?

A

Use the list of one-item sets to generate list of two-item sets that meet the support criterion

81
Q

In practical terms, what limits the value of k in the generation of item sets?

A

In practice, K is bounded by 10.

82
Q

What is the general process for generating k-item sets, according to Step 1?

A

After Using the list of one-item sets to generate list of two-item sets we then Use list of two-item sets to generate list of three-item sets and Continue up through k-item sets

83
Q

Consider that itemset {Milk, Diaper, Vitamin}:

How many combinations of Rules Can be obtained

A

6 Combinations

If you have k elements in the itemset, there will be (2k – 2) candidate association rules

84
Q

What does a high value of condidence suggest about an association?

A

Suggests a strong association rule

85
Q

T or F:

Rules originating from the same item set have identical confidence but may have different support

A

False

Vice versa

86
Q

A →B is an association rule if

List the Conditions that apply

A
  • Confidence(A → B) ≥ minconf,
  • Confidence(A → B) ≥ Confidence(B → A)
  • Lift Ratio (A → B) > 1
87
Q

Is Association Rule Supervised or Unsupervised data Mining Technique?

A

Unsupervised

88
Q

Is Association Rule applied for Categorical and Numerical Variables ?

A

Categorical

89
Q

If the Lift Ratio of {Bread, Butter} → {Yogurt} =1 what that
means?

A

That means that buying yogurt is
independent from buying {Bread, Butter}.

90
Q

If the confidence rule of {Yogurt} → {Bread, Butter } is high.
Does it mean buying Yogurt most probably will lead to
buying Bread and Butter

A

That depends on the Left Ration value