R Sampling Methods Flashcards
How to pull a random sample in R
sample() Can use this to subset something. Say something has 392 rows. sample(392,196) will give you 196 random number in the range 1-392.
Subsetting within predictive model
lm(y~. data=DF, subset = train)
where train can either be a (1) boolean vector or (2) numerical vector corresponding to the rows of the data frame you wish to subset.
Can use this in conjunction with sample() to manually create a vector corresponding to rows.
How to perform cross validation
library(boot) # has cv functions in it
cv.glm() function is part of this library
glm_fit = glm(mpg~horsepower, data = Auto)
glm_err= cv.glm(Auto, glm_fit)
glm_err$delta #contains the cross validation error, when LOOCV two numbers output from this will be the same
how to generate a polynomial equation in formula
poly()
How to perform k-fold cross validation
library(boot)
glm_fit = glm(mpg~horsepower, data = Auto)
glm.err = cv.glm(Auto, glm.fit, K=10)
For loop in R
for (i in 1:10){
glm.fit = glm(mpg~poly(horsepower, i), data= Auto)
cv.error.10[i] = cv.glm(Auto, glm.fit, K=10)$delta[1]
}
Omit Missing Values From Data Frame
na.omit(DataFrame)
how to count missing values in a column
sum(is.na(DataFrame$column))