R Trees Flashcards
Create Categorical Variable From Contious Var
High = ifelse(Sales)
Add Column to data frame
Carseats = data.frame(Carseats, High) #Add column to data frame
Fitting tree model
If doing it to categorical variable
tree.carseats = tree(High~.-Sales, Carseats) # Want to fit the model to all variables besides Sales, which we converted to categorical earlier
summary(tree.carseats)
Pruning Decision Trees
cv. carseats=cv.tree(tree.carseats, FUN = prune.misclass)
cv. carseats
pick the tree size that has the lowest dev
library for decision trees
library(tree)
library for regression trees
library(tree) is the same thing as a decision tree except the outcome is a continous variable instead of 0/1
library for random forest
library(randomForest)
how to get variable importance in random forest
importance(modeloutput)
library and function for random forest with boosting
library(gbm)
boosted.forest = gbm(y~., data = DF, distribution…etc)
what are the tuning parametrs for a boosted regression or decision tree?
#Parameters to Boosting Model - Really 3 Tuning Parameters # 1) The Number of Trees (n.trees), we can overfit if this is too large, want to choose this with CV # 2) Distribution = 'gaussian' if regression model, 'bernoulli' if classification model # 3) shrinkage parameter = you can usually leave this at default, but can tweak this using CV, default is .001 # 4) Interaction Depth (d), d = 1 usually works well, but idea is to keep this small can also select this using CV