DSE R CODE Flashcards

1
Q

reading data

A

Advertising = read.csv(“Data/Advertising.csv”, head = TRUE)
head(Advertising)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Generate lm model

A

lm1 = lm(sales ~ TV, data = Advertising)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Generate r output table

A

summary(lm1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gnerate confidence interval for coeff of variable and constnat (95% or 90%)

A

confint(lm1)
confint(lm1,level=0.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to plot scatter plot? Sales against tv

label axis
colour red
dots
with regression line line width 3 and colour boue

A

plot(x = Advertising$TV, y = Advertising$sales,
xlab = “TV”, ylab = “Sales”, col = “red”, pch = 19)
abline(lm1, col = “blue”, lwd = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to get coefficient from R output table or summary table?>

A

summary(lm4)$coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to determine whether there is relationship between variables? 4dp

A

round(cor(Advertising), digits = 4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to get adjusted r squared?

A

summary(lm_mpg1)$adj.r.squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to load data from ISLR package?
Get auto fata

A

library(ISLR)
data(Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to generate residuals plot( such as q-q plot) ?

A

par(mfrow = c(2, 2))
plot(lm_mpg1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to generate scale-location plot?

A

plot(lm_mpg2, which = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

xclude name variable from linear model (mpg= everything except name)

A

lm_mpg4 = lm(mpg ~ . - name, data = Auto)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to add abels to nominal categorical data?

A

Auto$origin = factor(Auto$origin, labels = c(“American”, “European”, “Japanese”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to generate logistic model ?

A

glm_fit = glm(default ~ balance, data = Default, family = binomial)

need to put familiy=binmomial

BINOMIAL IS NOT A STRING!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to have predicted probabilities from your own data

A

df_new = data.frame(student = c(“Yes”, “No”),
balance = c(1500, 1500), income = c(40000, 40000))

predict(glm_fit, newdata = df_new, type = “response”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to generate confusion matrix from iyself? START FROM PREDICTION STEP

A

STEP1: PREDICT FIRST
glm_prob = predict(glm_fit, type = “response”)

STEP 2: TABLE GENERATION
table(glm_prob >= 0.5, Default$default)

17
Q

How to plot ROC curve and AUC? (Start from prediction)

CURVE: LINEWIDTH 3
COLOUR BLACK

PASTE TEXT
Add line at 0 slope 1 ,dotted , line width 1

A

STEP 1: PREPARE DATA
pred = prediction(glm_prob, Default$default) #performed on training data

perf = performance(pred, measure = “tpr”, x.measure = “fpr”)

auc_perf = performance(pred, measure = “auc”)@y.values[[1]]

STEP 2: Plot ROC curve
plot(perf, lwd = 3, col = “black”)

abline(0, 1, lwd = 1, lty = 2) # add dashed diagonal line

text(0.4, 0.8, paste(“AUC =”, round(auc_perf, 2))) # add text

18
Q

How to generate validation set?

A

set.seed(1101)

train = sample(10000, 5000, replace = FALSE)

Default_train = Default[train, ]

Default_test = Default[-train, ]

19
Q

How to generate confusion matrix from training set?

A

glm_prob_train = predict(glm_fit2, type = “response”)

table(glm_prob_train > 0.5, Default_train$default)

20
Q

How to get trainign error after generating train results?

A

glm_pred = ifelse(glm_prob_train > 0.5, “Yes”, “No”)

mean(glm_pred != Default_train$default)

21
Q

How to plot residual vs leverage plot?

A

plot(lm1, which =5 )

which =5!!!! not which =4 (cook distance)

22
Q

What is the syntax for knn?

A

library(kknn) #not knn

fit=kknn(model, train, test, k, kernel=“rectangular”)

23
Q

How to choose not to standardize variable for knn

A

fit=kknn(model, train, test, k, kernel=“rectangular”, scale =FALSE)

24
Q

How to generate naive bayes model

A

library(es1071)
fit = naiveBayes(model, data)

25
Q

vWhat assumption does the naive bayes function in r rely on ? (slide 19)

A

assumption of Normal distribution for quantitative predictors.

26
Q

Do you tune parameters to select for naive bayes in R? (slide 19)

A

no

27
Q

What is the tree basic syntax?

A

tree.fit=tree(y~x, data, tree.control)

28
Q

What is the rpart basic syntax?

A

tree.fit=rpart(y~x, data, rpart.control)

29
Q

What is the syntax for kmenas? Explain the parameters

A

kmeans(x, centers=k, nstart=n)

“centers” specifies K, “nstart” tells R how many random initializations we want to perform.

30
Q

What is the function for pca?

A

prcomp(x, scale=TRUE)
scale means standardization

31
Q

What is the syntax of pcr?
What to do if we want to use LOOCV?

A

fit = pcr(y~ ., data = x, scale = TRUE, validation = “CV”)

validation=CV” selects M by 10-fold CV. Use “LOO” for LOOCV.

32
Q

How to plot CV MSE for pcr?

A

validationplot(fit, val.type=“MSEP”)

33
Q

How to predict on valuews on pcr?

A

pred = predict(fit, x[test,], ncomp = M)

34
Q
A