Chapter 3 Code Flashcards
What is at the start of each code block to prevent random differences between answers?
set.seed(123)
How do you investigate the class of a data set containing transactions?
class(data)
How do you view a summary of a data set containing transactions?
summary(data)
How do you look at the first five transactions of a data set?
inspect(data[1:5])
How do you examine the frequency of items in a transaction data set?
itemFrequency(data)
How do you plot the frequency of items that appear in at least 10% of transactions?
itemFrequencyPlot()
itemFrequencyPlot(data, support = 0.1)
Increasing the support, shows you would have less of these items, decreasing the support increases it (more items meet the criteria)
In reality, a support can be quite small (compared to examples we saw) as we have a large transaction database
How do you plot the frequency of top 20 items in transactions?
itemFrequencyPlot(data, topN = 20)
How do you create apriori rules?
rules <- apriori(data)
The default uses a support of 0.1
apriori(data, parameter = list(support = 0.006, confidence = 0.25,
minlen = 2))
Adding parameter = list() to create more specific rules
Need to set minimum length to two so that we have an A -> B
Change support and confidence depending on how many things you want / if you want specific things
High support = no rules
Low support = a lot of rules but they may not be helpful and meaningful
How do you investigate the rules created?
summary(rules)
inspect(rules[1:3]) - eg to inspect the first three rules
How do you sort the rules by lift, and display first 10 rules?
inspect(sort(rules, by = “lift”)[1:10])
What is the lift?
Lift is a metric used in the Apriori algorithm to measure the strength of the association between two items in a dataset:
Lift is the factor with which the likelihood of item A leading to item B is higher than the likelihood of item A.
How do you check if any rules are redundant?
is.redundant(rules)
What is a more general rule?
A rule is more general if it has the same RHS but one or more items removed from the LHS.
What is a redundant rule?
A rule is redundant if a more general rule with the same or a higher confidence exists.
That is, a more specific rule is redundant if it is only equally or even less predictive than a more general rule.
What is a more general rule?
A rule is more general if it has the same RHS but one or more items removed from the LHS.
Formally, a rule X -> Y is redundant if for some X’ subset X, conf(X’ -> Y) >= conf(X -> Y).