Data Science using Python and R - 14 Flashcards
What is the form of association rules?
If antecedent, then consequent
What are the two key measures associated with an association rule?
- Support
- Confidence
What does support represent in an association rule?
The proportion of transactions that contain both A and B
Define confidence in the context of association rules.
The percentage of transactions containing A that also contain B
What is the curse of dimensionality?
The exponential growth of possible association rules with the number of attributes
What is the a priori algorithm used for?
Mining association rules more efficiently by reducing the search space
What is an example of a trivial rule that is excluded from association rules?
If beans and squash, then beans
How is lift defined in association rules?
The ratio of the confidence of the rule to the prior probability of the consequent
True or False: Lift indicates how much more likely the consequent is given the antecedent compared to the general population.
True
What is the significance of the lift value greater than 1?
It indicates that the antecedent increases the likelihood of the consequent
Fill in the blank: The support for an association rule A ⇒ B is calculated as _______.
number of transactions containing both A and B / total number of transactions
What are the three measures of goodness for an association rule?
- Support
- Confidence
- Lift
What is the minimum support threshold set for rule mining in the example?
0.01 (1%)
What does the term ‘antecedent’ refer to in an association rule?
The item or set of items that imply the consequent
What is the minimum confidence threshold set for rule mining in the example?
0.4 (40%)
What does ‘max number of antecedents’ specify in the context of mining rules?
The maximum number of items that can be in the antecedent
How do you convert a variable to an ordinal factor in R?
Use the ordered() function on as.factor()
What is the purpose of the apriori() function in R?
To generate association rules based on specified parameters
What command is used to inspect the top rules sorted by lift in R?
inspect(head(all.rules, by = ‘lift’, n = 10))
What should be done to rules containing Churn in the antecedent?
Delete those rules
What is the first step in mining association rules using R?
Read in the data set and subset the desired variables
What does the term ‘mutually exclusive’ refer to in the context of association rules?
Antecedent A and consequent B cannot contain the same items
How is the lift interpreted for the rule ‘If buy diapers, then buy beer’ with a lift value of 2.5?
Customers who buy diapers are 2.5 times as likely to buy beer as the general population
What do zeros and ones represent in the context of antecedents?
Zero means the antecedent did not meet the condition and one means that it did.
What command is used to take the absolute value of t1+t2‐1?
abs()
What is the result of using abs() on t1+t2‐1?
A single vector of zeros and ones.
What does the vector non.churn.ant indicate?
It indicates antecedents that do not contain Churn.
How do you subset rules that do not have Churn in the antecedent?
good.rules <‐ all.rules[non.churn.ant == 1]
What command is used to sort good.rules by descending lift values?
inspect() and head()
What is the purpose of creating a contingency table of Churn and Customer Service Calls?
It is utilized for confirming metrics such as support, confidence, and lift.
What does support measure in the context of association rules?
The intersection of two events.
What is the formula for calculating support?
P(CSC and Churn True) / total number of transactions
What is the confidence of a rule equivalent to?
The conditional probability P(B | A).
What is lift in association rules?
It measures how much more likely the consequent is given the antecedent compared to the general population.
What does the confidence difference criterion evaluate?
The absolute difference between the prior probability of the consequent and the confidence of the rule.
What conditions must be met for a rule to be included using the confidence difference criterion?
Prior probability of consequent - Rule confidence > 0.40
What does the confidence quotient criterion measure?
The absolute ratio between the prior probability of the consequent and the confidence of the rule.
What is the rule for including rules based on the confidence quotient criterion?
Rule confidence / Prior proportion of consequent > 0.40
What R command is used to apply the confidence difference criterion?
apriori() with specific parameter settings.
What does the parameter ‘arem’ specify in the apriori() command?
It specifies that the confidence difference criterion should be used.
What is the significance of the value ‘0.59016’ in the context of Rule 1?
It indicates the confidence of Rule 1.
What does the confidence difference statistic help to weed out?
Obvious rules that do not provide new insights.
What R command is used to generate rules with the confidence quotient criterion?
apriori() with adjusted parameter settings.
What is one application of the confidence quotient criterion?
Finding rules that predict rare events.
What does the command ‘addmargins()’ do in the context of a contingency table?
It adds totals to the margins of the table.
What is the total number of transactions used to calculate support?
3000
What does a lift value of 4.061 indicate?
Customers who have made five calls to customer service are 4.061 times as likely to churn.
How is the confidence for a rule calculated using a contingency table?
By dividing the number of transactions with both events by the number of transactions containing the antecedent.
What does ‘rules.confdiff’ represent in the R code?
The output of association rules generated with the confidence difference criterion.
What should be done to the rules after obtaining them with the confidence quotient criterion?
Subset only those rules which do not have Churn in the antecedent.
What is the purpose of the exercises mentioned at the end of the content?
To reinforce understanding of the concepts discussed.
What should be included in the tables for each of the variables?
Counts and proportions
These tables will be used to obtain the prior proportions of various values.
What are the minimum support and confidence values for generating association rules?
Minimum support of 5%, minimum confidence of 5%
Maximum antecedents of 1 for initial rule generation.
How should the generated rules be displayed?
Sorted by descending lift value
Lift value is an important metric for evaluating the strength of rules.
What is the next step after generating the association rules?
Select the rule with the greatest lift and interpret it
This is crucial for understanding the significance of the association.
What quantities need to be confirmed by hand for the selected rule?
Support, Confidence, Lift
These values help validate the association rule.
What is the maximum number of antecedents for generating a second set of association rules?
Maximum antecedents of 2
This allows for more complex relationships to be explored.
What criteria should be used for the confidence difference in association rules?
Confidence difference lower bound of 30, minimum support of 5%, minimum confidence of 5%, maximum antecedents of 1
This criterion helps in finding significant rules based on confidence differences.
What should be done after obtaining the rules with the confidence difference criterion?
Select the rule with the greatest lift and confirm the confidence difference by hand
This verification process ensures the reliability of the rule.
What is the maximum number of antecedents for the confidence quotient criterion?
Maximum antecedents of 3
This allows for evaluating more complex rules.
What data set will be used for validating the association rules found earlier?
AR_Test data set
Validation is necessary to ensure the robustness of the rules.
What should be done to compare the rules from the AR_Test data set with the training data set?
Evaluate if the association rules have been validated
This comparison is critical for assessing the effectiveness of the rules.