R Misc Flashcards
What is the package that allows you to do ridge regression and lasso regression
library(‘glmnet’)
How to implement CV when you are doing linear or logistics regression?
cv.glmnet() is a function that will do CV for you.
Does glmnet library have predict function? What is glmnet for?
yes, it does however the regsubsets (for best subset, forward and backward LR) function does not have one.
Also glmnet is for lasso and ridge regression
How to conduct lasso regression
library(glmnet) # Do CV to get best tuning paramter # Use alpha = 1 for lasso cv.out = cv.glmnet(x[train, ], y[train], alpha = 1) bestlam = cv.out$lambda.min
Print out character array one letter at at a time by looping
x = c(‘a’, ‘b’, ‘c’)
for(letter in x) {
print(letter)
}
if control structure R
if() { ## do something } else if() { ## do something different } else { ## do something different }
easiest to initiate an infinite loop (on purpose)
repeat{
if(condition) {
break
}
}
how to skip an interation in a loop
Use next for(i in 1:100) { if(i <= 20) { ## Skip the first 20 iterations next } ## Do something here }
while loops
while(count < 10) {
print(count)
count <- count + 1
}
How to make a function
f ) { ## Do something interesting }
get list of arguments for a function
args()
ex: args(lm)
describe the lazy evaluations of functions
Arguments to functions are evaluated lazily, so they are evaluated only as needed.
f <- function(a, b) {
a^2
}
f(2)
This function never actually uses the argument b, so calling f(2) will not produce an
error because the 2 gets positionally matched to a.
What does … mean in a function defintion
- when extending another function, but don’t want to copy entire argument list
- … so that extra arguments can be passed to methods
- … is necessary when the number of arguments cannot be known in advance, such as paste() and cat.
Display the list of environment variables and packages R will iterate through to find a variable, in the order they will be searched
search()
When user loads a library, where does it go on the search list?
it goes into second place, everything else goes down 1. the global environment of the users workspace is always number 1
what are free variables?
f <- function(x, y) {
x^2 + y / z
}
This function has 2 formal arguments x and y. In the body of the function there is
another symbol z. In this case z is called a free variable.
how do free variables get values?
they are not local arguments, can be defined outside the function. R will search for the free variable starting in the global environment
Does R let you define a function inside another function?
Yes, lots of languages don’t let you do this
Example of defining a function inside another function
make.power <- function(x) { x^n } pow }
lapply vs sapply
lapply will apply a function over a list, whereas sapply will do the same thing, except simplify the output if possible
apply -
str(apply)
function (X, MARGIN, FUN, …)
X is an Array
MARGIN is what margin should be retained - 1 is row, 2 is columns
FUN is what function do you want to apply
… is for other arguments to be passed to FUN
shortcut functions equiavalent to apply
rowSums = apply(x, 1, sum) rowMeans = apply(x, 1, mean) colSums = apply(x, 2, sum) colMeans = apply(x, 2, mean)
Example of apply using the … arguments
apply(x, 1, quantile, probs = c(0.25, 0.75))
tapply
used to apply a function over subset of a vector, like a group by statement, can apply a function over a factor
split
can use to split a dataframe into pieces
s <- split(airquality, airquality$Month)
return only complete cases of data frame
DataFrame[complete.cases(DataFrame), ]
na.omit(DataFrame)
Add column to data frame
Carseats = data.frame(Carseats, High)
convert confusion matrix table into percentages
prop.table(table(trainClass))
Steps to follow when using the caret package
- partition data *createDataPartition()
- test for low variance variables *nearZeroVar()
- test for multicolinearity *findcorrelation()
- preprocess - center/scale and derive pca *preprocess()
- Build and tune model *train()
count occurences of each factor in a column
table(dataframe$columnname)
profile data
use describe() in psych package
install packages using command
install.packages(‘psych’)
return predicted values for linear regression
fitted(modelobject)
get the 95% confidence interval for regression coeeficient
confint(modelobject)