Bootstrapped models Flashcards
If we had B independent training sets and we fit a model on each, taking the average predicted value, what happens to the bias and variance of the prediction?
The bias remains constant, the variance decreases.
In practise, why might we not be able to achieve this?
If we can obtain B independent training sets, then we would likely achieve better accuracy by combining them
Explain the bootstrapping algorithm
Aim to simulate data by approximating the population with our data set. Sample from the data set with replacement.
What are bagged tree models and what hyperparameters do we need to tune?
Bagged trees are prediction functions formed by decision trees that have been trained on bootstrapped data and aggregates. We need to tue the number of trees as well as the tree hyperparameters
What are random forests and what hyperparameters do we need to tune?
Same as bagged trees, except we only split by a randomly chosen subset of features at each stage of the recursive binary splitting algorithm. We need to tune the number of splitting candidates and all bagged tree hyperparametters.
do random forests tend to lead to better or worse prediction functions than bagged trees and why?
Better: trees that are more different are more decorrelated, leading to a greater reduction in variance upon aggregating.
How do we calculate the importance of a feature in the context of a tree?
We could look at the sum of improvements gained by splitting at that variable.