Tree based methods Flashcards
Can tree based methods be used for regression or classifcation problems?
BOTH
Regression Tree Process
- Divide the predictor space into J non-overlapping regions
- For every observation in these regions, make the same prediction which is the mean of that region.
- Uses a top down, greedy approach does only whats best in the NEXT step. Doesn’t necesarily find the optimal tree
- Apply regularization to prune the tree, use CV to choose the regularization parameter.
Classification Tree Process
Same as Regression, except the prediction is just the most commonly occurring class in the terminal node / region.
How to handle categorical variables with trees
You don’t need to dummy code them, it handles the variables automatically!
Pros/Cons of Decision Trees
Interpretability, most closely mirrors human decision making, nice graphical display, don’t have to dummy code categorical variables.
Low predictive power, but - can use methods that aggregate many decision trees like: bagging, boosting, random forest.
Suffer from high variance.
Difference between Bagging and Bootstrap
Bagging = Bootstrap, basically build many predictive models on separate training sets and average the results.
Want to do this with algos that are susceptible to high variance like trees, although can be applied to almost any method.
What happens when Y is qualitative/categorical when you use bagging?
Bagging = Bootstrap, but then it becomes a majority vote of the models
Test Error with bagging
You hold out part of your data set (typically 1/3rd), called Out-Of-Bag observations(OOB).
Disadvantages of Bagging
Bagging = Bootstrap
Can be difficult to interpret the resulting model, because you have lots of trees. HOWEVER can obtain an overall summary of the importance of each predictor using RSS (bagging regression trees), or Gini Index (bagging classification trees).
Variable importance in Bagging vs. Non Bagged Trees
W/Non Bagged Trees, the top most layer of the tree is the most important the lowest level leaves of tree are least important.
Bagged trees - since you have so many trees you can get R to output a variable importance metric
How does a random forest work
Like a regression tree, except that at each split model is only allowed to consider a random sample of m predictors from the full set of p predictors, usually about square root(p) predictors. Each split a fresh sample of m predictors is chosen.
Rationale of Random Forest
Counteracts the disadvantage of the top-down, greedy approach of decision trees. This decorrelates the trees and reduces variance better than boosting as well. Usually use a large number of trees to allow the error rate to settle down.
RAndom forest vs. Bagging
Random forest is generally superior to bagging/bootstrap.
What is boosting?
Generally is an iterative learning process whereby models are successively fit against a previous model’s residuals. This method allows the model to iteratively increase accuracy in areas it doesn’t perform well. There are 3 parameters: (1) # of trees (2) regularization parameter (3) number of d splits in each tree
bagging vs random forest
Bagging - trees can be highly correlated because they all use the greedy top down approach, even though there are multiple training sets. In random forest, we force the tree to chose its splits from a random subset of the dimensions in the data.