Bootstrapped models Flashcards

1
Q

If we had B independent training sets and we fit a model on each, taking the average predicted value, what happens to the bias and variance of the prediction?

A

The bias remains constant, the variance decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In practise, why might we not be able to achieve this?

A

If we can obtain B independent training sets, then we would likely achieve better accuracy by combining them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the bootstrapping algorithm

A

Aim to simulate data by approximating the population with our data set. Sample from the data set with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are bagged tree models and what hyperparameters do we need to tune?

A

Bagged trees are prediction functions formed by decision trees that have been trained on bootstrapped data and aggregates. We need to tue the number of trees as well as the tree hyperparameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are random forests and what hyperparameters do we need to tune?

A

Same as bagged trees, except we only split by a randomly chosen subset of features at each stage of the recursive binary splitting algorithm. We need to tune the number of splitting candidates and all bagged tree hyperparametters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

do random forests tend to lead to better or worse prediction functions than bagged trees and why?

A

Better: trees that are more different are more decorrelated, leading to a greater reduction in variance upon aggregating.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we calculate the importance of a feature in the context of a tree?

A

We could look at the sum of improvements gained by splitting at that variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly