Chapter 3 Code Flashcards

1
Q

What is at the start of each code block to prevent random differences between answers?

A

set.seed(123)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you investigate the class of a data set containing transactions?

A

class(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you view a summary of a data set containing transactions?

A

summary(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you look at the first five transactions of a data set?

A

inspect(data[1:5])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you examine the frequency of items in a transaction data set?

A

itemFrequency(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you plot the frequency of items that appear in at least 10% of transactions?

A

itemFrequencyPlot()

itemFrequencyPlot(data, support = 0.1)

Increasing the support, shows you would have less of these items, decreasing the support increases it (more items meet the criteria)
In reality, a support can be quite small (compared to examples we saw) as we have a large transaction database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you plot the frequency of top 20 items in transactions?

A

itemFrequencyPlot(data, topN = 20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you create apriori rules?

A

rules <- apriori(data)
The default uses a support of 0.1

apriori(data, parameter = list(support = 0.006, confidence = 0.25,
minlen = 2))
Adding parameter = list() to create more specific rules

Need to set minimum length to two so that we have an A -> B

Change support and confidence depending on how many things you want / if you want specific things
High support = no rules
Low support = a lot of rules but they may not be helpful and meaningful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you investigate the rules created?

A

summary(rules)

inspect(rules[1:3]) - eg to inspect the first three rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you sort the rules by lift, and display first 10 rules?

A

inspect(sort(rules, by = “lift”)[1:10])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the lift?

A

Lift is a metric used in the Apriori algorithm to measure the strength of the association between two items in a dataset:

Lift is the factor with which the likelihood of item A leading to item B is higher than the likelihood of item A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you check if any rules are redundant?

A

is.redundant(rules)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a more general rule?

A

A rule is more general if it has the same RHS but one or more items removed from the LHS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a redundant rule?

A

A rule is redundant if a more general rule with the same or a higher confidence exists.

That is, a more specific rule is redundant if it is only equally or even less predictive than a more general rule.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a more general rule?

A

A rule is more general if it has the same RHS but one or more items removed from the LHS.

Formally, a rule X -> Y is redundant if for some X’ subset X, conf(X’ -> Y) >= conf(X -> Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you see which rules are redundant?

A

inspect(rules[is.redundant(rules)])

17
Q

How do you remove rules which are redundant?

A

rules_non_redundant <- inspect(rules[!is.redundant(rules)])

18
Q

How do you find subsets of rules containing a specific item?

A

specific_rules <- subset(rules, subset = items %in% “example”)

19
Q

How do you write rules to a csv file?

A

write(rules, file = “rules.csv”, sep = “,”, quote = TRUE, row.names = FALSE)

20
Q

How do you convert the rule set to a data frame?

A

rules_df <- as(rules, “data.frame”)

str(rules_df)
head(rules_df)

21
Q

How do you plot the first 10 rules?

A

subrules <- rules[1:10]
plot(subrules, method=”graph”)

22
Q

How do you export rules to graphml format so that they can be further analysed by igraph/Gephi?

A

saveAsGraph(rules, file=”rules.graphml”)

23
Q

For a database formatted as a data frame, how can you initially investigate it?

A

data(“dataI”)
head(data)
skim(data)

24
Q

How do you transform “data frame” to “transactions”?

A

Transform each variable in the dataframe to a factor

25
Q

How do you transform each variable in the dataframe to a factor?

A

Using map_df() and as.factor()

data_factor <- map_df(data, function(x){as.factor(x)})
transactional_data <- as(data_factor, “transactions”)

26
Q

What is data of the class arules?

A

Not a regular rectangular flat file, it is called arules, data is set up for association mining