5. Trees Flashcards
What is the general algorithm for creating a regression/classification tree? We use recursive, CCP, and CV together how?
- Construct a tree using recursive binary splitting
- Obtain a sequence of best sub trees, as a function of lambda, using cost complexity pruning
- Choose the value of lambda by applying K fold CV, we select the lambda that results in the lowest value of CV error
- The best sub tree is the sub tree created in step 2 with the chosen value of lambda
With recursive binary splitting, we aim to minimize _____ for regression and ______ for classification.
- Formula
2. Contains impurity measure
What are the 3 purity measures for classification trees? Draw the graph of the three of them
Gini, entropy, classification
Entropy is always the greatest
With cost complexity pruning, in both of regression and classification, what are we trying to minimize?
Formula
Recursive binary splitting is referred to as a _____.
Top-down: begins with the predictor space as one large region
Greedy: at each split, the best split is selected without accounting for future splits that could be better.
Approach
Does cost complexity pruning increase or decrease the variance? How is flexibility measured for trees?
Decreases variance because decreases number of terminal nodes, which is how flexibility is measured
Cost complexity pruning results in a group of sub trees that are _____.
Nested.
For cost complexity pruning, does increasing the tuning parameter increase or decrease the variance of the method?
Decreases variance, because the number of terminal nodes is a decreasing function of the tuning parameter.
When will trees outperform linear models?
When the relationship between explanatory and response is far more complicated than a linear equation
What are 4 advantages of trees?
- Easy to interpret and explain
- Can be presented visually
- Manage categorical without the need for dummies
- Mimic human decision making
What are the 2 disadvantages of trees?
- Not robust
2. Not the same degree of predictive accuracy as other statistical methods
What is the procedure for bagging? 3
- Create b bootstrap samples from the original training set.
- Create a decision tree for each bootstrap sample using recursive binary splitting
- Predict the response of a new observation by either averaging the responses (regression) or by using the most frequent category (classification) across all b trees
If we increase the value of b in bagging, does this cause overfitting?
No
Does bagging reduce the variance? If yes, why?
Yes, because of the fact that the variance of the average of a set of observations is less than the variance of one single observation
What is a disadvantage to bagging?
Difficult to interpret the entire bagged model.
The out of bag error is used as an estimate of _____.
Test MSE
On average, how many observations are used to train each bagged tree? What are the observations called that were not used to train the trees?
2/3. They are called the OOB observations.
If the bootstrap training dataset has a very strong predictor variable, what will happen to the trees? What happens if the trees are similar to one another?
They will be monopolized by this predictor. If the trees are similar, the predictions will tend to be correlated. This then, does not reduce variance
Random forests aim to _______ trees.
Decorrelate