Chapter 8 - Bagging, Random Forest, Boosting Flashcards

Question 1

Q

Bagging - what is it? when do we do it?

Answer

A

Bootstrap Aggregating. In bootstrap, we replicate our dataset by sampling with replacement. In bagging , we average the predictions of a model fit to many bootstrap samples.

when do we bag?

when regression method or classifier has a tendency to overfit, bagging reduces variance of prediction.
when n is large, the empirical distribution is similar to the true distribution
bootstrap samples are like independent realizations of the data
bagging amounts to averaging the fits from many independent datasets (could reduce variance by 1/B)

Question 2

Q

Bagging Decision Trees

Answer

A

disadvantage: each tree you make from a BS sample could be completely different
1. for each predictor, add up the total amt by which the RSS (or Gini index) decreases every time we use the predictor in Tb
2. average this total over each BS estimate

Question 3

Q

Out of Bag Error

Answer

A

to estimate the test error of a bagging estimate, we could use CV, but each time we do BS we only use ~ 63% of the data.

Idea: use the rest of the observations as a test set

OOB error: for each sample x_i, find the prediction yhat_b_i for all bootstrap samples b which do not contain x_i (there should be around 0.37 B). average this predictions for yhat_oob_i
compute the error (yi - yoob)^2
average the error over all observations
when n is large this is equivalent to LOOCV
test error decreases as we increase B

Question 4

Q

The problem with bagging trees

Answer

A

the trees by different BS samples can be similar

Question 5

Q

Random Forest

Answer

A

fit a decision tree to different BS samlples
when growing the tree, we select a random sample of m < p predictors to consider in each step
this leads to very different (or “uncorrelated”) trees for each sample
finally average the prediction of each tree (increasing bias but decreasing variance)
choosing m: optimal is usually around sqrt(p) but it can be used as a tuning parameter

Question 6

Q

Boosting (don’t really understand this)

Answer

A

set fhat(x) = 0, and ri  = yi for i = 1, ..., n
for b = 1 to B iterate:
- fit a decision tree fhat_b with d splits to the response r1,...,rn
- update the prediction to fhat(x) <- r_i - lambda fhat_b (x)
output  the final model
fhat(x) = sum[lambda*fhat_b(x)

boosting learns slowly: we first use the samples that are easiest to predict, we then downweigh these cases and move on to harder cases

tuning using lambda, d, and B

Chapter 8 - Bagging, Random Forest, Boosting Flashcards

(6 cards)