Chapter 3 Code Flashcards by Louise Rodgers

What is at the start of each code block to prevent random differences between answers?

set.seed(123)

How well did you know this?

Not at all

Perfectly

How do you investigate the class of a data set containing transactions?

class(data)

How well did you know this?

Not at all

Perfectly

How do you view a summary of a data set containing transactions?

summary(data)

How well did you know this?

Not at all

Perfectly

How do you look at the first five transactions of a data set?

inspect(data[1:5])

How well did you know this?

Not at all

Perfectly

How do you examine the frequency of items in a transaction data set?

itemFrequency(data)

How well did you know this?

Not at all

Perfectly

How do you plot the frequency of items that appear in at least 10% of transactions?

itemFrequencyPlot()

itemFrequencyPlot(data, support = 0.1)

Increasing the support, shows you would have less of these items, decreasing the support increases it (more items meet the criteria)
In reality, a support can be quite small (compared to examples we saw) as we have a large transaction database

How well did you know this?

Not at all

Perfectly

How do you plot the frequency of top 20 items in transactions?

itemFrequencyPlot(data, topN = 20)

How well did you know this?

Not at all

Perfectly

How do you create apriori rules?

rules <- apriori(data)
The default uses a support of 0.1

apriori(data, parameter = list(support = 0.006, confidence = 0.25,
minlen = 2))
Adding parameter = list() to create more specific rules

Need to set minimum length to two so that we have an A -> B

Change support and confidence depending on how many things you want / if you want specific things
High support = no rules
Low support = a lot of rules but they may not be helpful and meaningful

How well did you know this?

Not at all

Perfectly

How do you investigate the rules created?

summary(rules)

inspect(rules[1:3]) - eg to inspect the first three rules

How well did you know this?

Not at all

Perfectly

How do you sort the rules by lift, and display first 10 rules?

inspect(sort(rules, by = “lift”)[1:10])

How well did you know this?

Not at all

Perfectly

What is the lift?

Lift is a metric used in the Apriori algorithm to measure the strength of the association between two items in a dataset:

Lift is the factor with which the likelihood of item A leading to item B is higher than the likelihood of item A.

How well did you know this?

Not at all

Perfectly

How do you check if any rules are redundant?

is.redundant(rules)

How well did you know this?

Not at all

Perfectly

What is a more general rule?

A rule is more general if it has the same RHS but one or more items removed from the LHS.

How well did you know this?

Not at all

Perfectly

What is a redundant rule?

A rule is redundant if a more general rule with the same or a higher confidence exists.

That is, a more specific rule is redundant if it is only equally or even less predictive than a more general rule.

How well did you know this?

Not at all

Perfectly

What is a more general rule?

A rule is more general if it has the same RHS but one or more items removed from the LHS.

Formally, a rule X -> Y is redundant if for some X’ subset X, conf(X’ -> Y) >= conf(X -> Y).

How well did you know this?

Not at all

Perfectly

How do you see which rules are redundant?

Study These Flashcards

inspect(rules[is.redundant(rules)])

How do you remove rules which are redundant?

Study These Flashcards

rules_non_redundant <- inspect(rules[!is.redundant(rules)])

How do you find subsets of rules containing a specific item?

Study These Flashcards

specific_rules <- subset(rules, subset = items %in% “example”)

How do you write rules to a csv file?

Study These Flashcards

write(rules, file = “rules.csv”, sep = “,”, quote = TRUE, row.names = FALSE)

How do you convert the rule set to a data frame?

Study These Flashcards

rules_df <- as(rules, “data.frame”)

str(rules_df)
head(rules_df)

How do you plot the first 10 rules?

Study These Flashcards

subrules <- rules[1:10]
plot(subrules, method=”graph”)

How do you export rules to graphml format so that they can be further analysed by igraph/Gephi?

Study These Flashcards

saveAsGraph(rules, file=”rules.graphml”)

For a database formatted as a data frame, how can you initially investigate it?

Study These Flashcards

data(“dataI”)
head(data)
skim(data)

How do you transform “data frame” to “transactions”?

Study These Flashcards

Transform each variable in the dataframe to a factor

How do you transform each variable in the dataframe to a factor?

Using map_df() and as.factor() data_factor <- map_df(data, function(x){as.factor(x)}) transactional_data <- as(data_factor, "transactions")

What is data of the class arules?

Not a regular rectangular flat file, it is called arules, data is set up for association mining

Chapter 3 Code Flashcards

(26 cards)