DSE R CODE Flashcards
reading data
Advertising = read.csv(“Data/Advertising.csv”, head = TRUE)
head(Advertising)
Generate lm model
lm1 = lm(sales ~ TV, data = Advertising)
Generate r output table
summary(lm1)
Gnerate confidence interval for coeff of variable and constnat (95% or 90%)
confint(lm1)
confint(lm1,level=0.9)
How to plot scatter plot? Sales against tv
label axis
colour red
dots
with regression line line width 3 and colour boue
plot(x = Advertising$TV, y = Advertising$sales,
xlab = “TV”, ylab = “Sales”, col = “red”, pch = 19)
abline(lm1, col = “blue”, lwd = 3)
How to get coefficient from R output table or summary table?>
summary(lm4)$coefficients
How to determine whether there is relationship between variables? 4dp
round(cor(Advertising), digits = 4)
how to get adjusted r squared?
summary(lm_mpg1)$adj.r.squared
How to load data from ISLR package?
Get auto fata
library(ISLR)
data(Auto)
how to generate residuals plot( such as q-q plot) ?
par(mfrow = c(2, 2))
plot(lm_mpg1)
How to generate scale-location plot?
plot(lm_mpg2, which = 3)
xclude name variable from linear model (mpg= everything except name)
lm_mpg4 = lm(mpg ~ . - name, data = Auto)
how to add abels to nominal categorical data?
Auto$origin = factor(Auto$origin, labels = c(“American”, “European”, “Japanese”))
how to generate logistic model ?
glm_fit = glm(default ~ balance, data = Default, family = binomial)
need to put familiy=binmomial
BINOMIAL IS NOT A STRING!!!!!
How to have predicted probabilities from your own data
df_new = data.frame(student = c(“Yes”, “No”),
balance = c(1500, 1500), income = c(40000, 40000))
predict(glm_fit, newdata = df_new, type = “response”)
How to generate confusion matrix from iyself? START FROM PREDICTION STEP
STEP1: PREDICT FIRST
glm_prob = predict(glm_fit, type = “response”)
STEP 2: TABLE GENERATION
table(glm_prob >= 0.5, Default$default)
How to plot ROC curve and AUC? (Start from prediction)
CURVE: LINEWIDTH 3
COLOUR BLACK
PASTE TEXT
Add line at 0 slope 1 ,dotted , line width 1
STEP 1: PREPARE DATA
pred = prediction(glm_prob, Default$default) #performed on training data
perf = performance(pred, measure = “tpr”, x.measure = “fpr”)
auc_perf = performance(pred, measure = “auc”)@y.values[[1]]
STEP 2: Plot ROC curve
plot(perf, lwd = 3, col = “black”)
abline(0, 1, lwd = 1, lty = 2) # add dashed diagonal line
text(0.4, 0.8, paste(“AUC =”, round(auc_perf, 2))) # add text
How to generate validation set?
set.seed(1101)
train = sample(10000, 5000, replace = FALSE)
Default_train = Default[train, ]
Default_test = Default[-train, ]
How to generate confusion matrix from training set?
glm_prob_train = predict(glm_fit2, type = “response”)
table(glm_prob_train > 0.5, Default_train$default)
How to get trainign error after generating train results?
glm_pred = ifelse(glm_prob_train > 0.5, “Yes”, “No”)
mean(glm_pred != Default_train$default)
How to plot residual vs leverage plot?
plot(lm1, which =5 )
which =5!!!! not which =4 (cook distance)
What is the syntax for knn?
library(kknn) #not knn
fit=kknn(model, train, test, k, kernel=“rectangular”)
How to choose not to standardize variable for knn
fit=kknn(model, train, test, k, kernel=“rectangular”, scale =FALSE)
How to generate naive bayes model
library(es1071)
fit = naiveBayes(model, data)
vWhat assumption does the naive bayes function in r rely on ? (slide 19)
assumption of Normal distribution for quantitative predictors.
Do you tune parameters to select for naive bayes in R? (slide 19)
no
What is the tree basic syntax?
tree.fit=tree(y~x, data, tree.control)
What is the rpart basic syntax?
tree.fit=rpart(y~x, data, rpart.control)
What is the syntax for kmenas? Explain the parameters
kmeans(x, centers=k, nstart=n)
“centers” specifies K, “nstart” tells R how many random initializations we want to perform.
What is the function for pca?
prcomp(x, scale=TRUE)
scale means standardization
What is the syntax of pcr?
What to do if we want to use LOOCV?
fit = pcr(y~ ., data = x, scale = TRUE, validation = “CV”)
validation=CV” selects M by 10-fold CV. Use “LOO” for LOOCV.
How to plot CV MSE for pcr?
validationplot(fit, val.type=“MSEP”)
How to predict on valuews on pcr?
pred = predict(fit, x[test,], ncomp = M)