Data Science using Python and R - 14 Flashcards

1
Q

What is the form of association rules?

A

If antecedent, then consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two key measures associated with an association rule?

A
  • Support
  • Confidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does support represent in an association rule?

A

The proportion of transactions that contain both A and B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define confidence in the context of association rules.

A

The percentage of transactions containing A that also contain B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the curse of dimensionality?

A

The exponential growth of possible association rules with the number of attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the a priori algorithm used for?

A

Mining association rules more efficiently by reducing the search space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an example of a trivial rule that is excluded from association rules?

A

If beans and squash, then beans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is lift defined in association rules?

A

The ratio of the confidence of the rule to the prior probability of the consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: Lift indicates how much more likely the consequent is given the antecedent compared to the general population.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the significance of the lift value greater than 1?

A

It indicates that the antecedent increases the likelihood of the consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fill in the blank: The support for an association rule A ⇒ B is calculated as _______.

A

number of transactions containing both A and B / total number of transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three measures of goodness for an association rule?

A
  • Support
  • Confidence
  • Lift
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the minimum support threshold set for rule mining in the example?

A

0.01 (1%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the term ‘antecedent’ refer to in an association rule?

A

The item or set of items that imply the consequent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the minimum confidence threshold set for rule mining in the example?

A

0.4 (40%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does ‘max number of antecedents’ specify in the context of mining rules?

A

The maximum number of items that can be in the antecedent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you convert a variable to an ordinal factor in R?

A

Use the ordered() function on as.factor()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the purpose of the apriori() function in R?

A

To generate association rules based on specified parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What command is used to inspect the top rules sorted by lift in R?

A

inspect(head(all.rules, by = ‘lift’, n = 10))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What should be done to rules containing Churn in the antecedent?

A

Delete those rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the first step in mining association rules using R?

A

Read in the data set and subset the desired variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the term ‘mutually exclusive’ refer to in the context of association rules?

A

Antecedent A and consequent B cannot contain the same items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is the lift interpreted for the rule ‘If buy diapers, then buy beer’ with a lift value of 2.5?

A

Customers who buy diapers are 2.5 times as likely to buy beer as the general population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do zeros and ones represent in the context of antecedents?

A

Zero means the antecedent did not meet the condition and one means that it did.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What command is used to take the absolute value of t1+t2‐1?

26
Q

What is the result of using abs() on t1+t2‐1?

A

A single vector of zeros and ones.

27
Q

What does the vector non.churn.ant indicate?

A

It indicates antecedents that do not contain Churn.

28
Q

How do you subset rules that do not have Churn in the antecedent?

A

good.rules <‐ all.rules[non.churn.ant == 1]

29
Q

What command is used to sort good.rules by descending lift values?

A

inspect() and head()

30
Q

What is the purpose of creating a contingency table of Churn and Customer Service Calls?

A

It is utilized for confirming metrics such as support, confidence, and lift.

31
Q

What does support measure in the context of association rules?

A

The intersection of two events.

32
Q

What is the formula for calculating support?

A

P(CSC and Churn True) / total number of transactions

33
Q

What is the confidence of a rule equivalent to?

A

The conditional probability P(B | A).

34
Q

What is lift in association rules?

A

It measures how much more likely the consequent is given the antecedent compared to the general population.

35
Q

What does the confidence difference criterion evaluate?

A

The absolute difference between the prior probability of the consequent and the confidence of the rule.

36
Q

What conditions must be met for a rule to be included using the confidence difference criterion?

A

Prior probability of consequent - Rule confidence > 0.40

37
Q

What does the confidence quotient criterion measure?

A

The absolute ratio between the prior probability of the consequent and the confidence of the rule.

38
Q

What is the rule for including rules based on the confidence quotient criterion?

A

Rule confidence / Prior proportion of consequent > 0.40

39
Q

What R command is used to apply the confidence difference criterion?

A

apriori() with specific parameter settings.

40
Q

What does the parameter ‘arem’ specify in the apriori() command?

A

It specifies that the confidence difference criterion should be used.

41
Q

What is the significance of the value ‘0.59016’ in the context of Rule 1?

A

It indicates the confidence of Rule 1.

42
Q

What does the confidence difference statistic help to weed out?

A

Obvious rules that do not provide new insights.

43
Q

What R command is used to generate rules with the confidence quotient criterion?

A

apriori() with adjusted parameter settings.

44
Q

What is one application of the confidence quotient criterion?

A

Finding rules that predict rare events.

45
Q

What does the command ‘addmargins()’ do in the context of a contingency table?

A

It adds totals to the margins of the table.

46
Q

What is the total number of transactions used to calculate support?

47
Q

What does a lift value of 4.061 indicate?

A

Customers who have made five calls to customer service are 4.061 times as likely to churn.

48
Q

How is the confidence for a rule calculated using a contingency table?

A

By dividing the number of transactions with both events by the number of transactions containing the antecedent.

49
Q

What does ‘rules.confdiff’ represent in the R code?

A

The output of association rules generated with the confidence difference criterion.

50
Q

What should be done to the rules after obtaining them with the confidence quotient criterion?

A

Subset only those rules which do not have Churn in the antecedent.

51
Q

What is the purpose of the exercises mentioned at the end of the content?

A

To reinforce understanding of the concepts discussed.

52
Q

What should be included in the tables for each of the variables?

A

Counts and proportions

These tables will be used to obtain the prior proportions of various values.

53
Q

What are the minimum support and confidence values for generating association rules?

A

Minimum support of 5%, minimum confidence of 5%

Maximum antecedents of 1 for initial rule generation.

54
Q

How should the generated rules be displayed?

A

Sorted by descending lift value

Lift value is an important metric for evaluating the strength of rules.

55
Q

What is the next step after generating the association rules?

A

Select the rule with the greatest lift and interpret it

This is crucial for understanding the significance of the association.

56
Q

What quantities need to be confirmed by hand for the selected rule?

A

Support, Confidence, Lift

These values help validate the association rule.

57
Q

What is the maximum number of antecedents for generating a second set of association rules?

A

Maximum antecedents of 2

This allows for more complex relationships to be explored.

58
Q

What criteria should be used for the confidence difference in association rules?

A

Confidence difference lower bound of 30, minimum support of 5%, minimum confidence of 5%, maximum antecedents of 1

This criterion helps in finding significant rules based on confidence differences.

59
Q

What should be done after obtaining the rules with the confidence difference criterion?

A

Select the rule with the greatest lift and confirm the confidence difference by hand

This verification process ensures the reliability of the rule.

60
Q

What is the maximum number of antecedents for the confidence quotient criterion?

A

Maximum antecedents of 3

This allows for evaluating more complex rules.

61
Q

What data set will be used for validating the association rules found earlier?

A

AR_Test data set

Validation is necessary to ensure the robustness of the rules.

62
Q

What should be done to compare the rules from the AR_Test data set with the training data set?

A

Evaluate if the association rules have been validated

This comparison is critical for assessing the effectiveness of the rules.