Part II: Resampling + bias/variance Flashcards

1
Q

What is variance

A

The difference between fits on different datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is bias

A

The general fit of the data (Least squares)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Does overfitting have high variance or bias?

A

Variance. Introduces noise to the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does underfitting have high variance or bias?

A

High bias, the model does not learn the relationship between the predicted and actual values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the minimum number of observations?

A

dimensions = observations
Basically if we go up in dimensions we need to go exponentially up in observations to have the same flexibility. Therefore we prefer less variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which resampling method can be used?

A

Leave 1 out cross validation
k-fold cross validation
bootstrapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Leave 1 out cross validation

A

So we are splitting the data n-1 times where each time we remove 1 observation, do train on the rest, and test with the one observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is k fold cross validation?

A

We make k folds (groups). Every group will act as test set once and training sets the remaining time. It’s less computationally expensive, and the results are pretty much as good as leave one out. Often K=5 or K=10 is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is stepwise selection?

A

Forward: Starting from an empty model and then adds one predictor at a time that improves the model the most (aka test all predictors one at a time and keep the one that improves the model most).
Backwards: Start from a full model with all predictors, and then drop them one at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the drawback from forward and backwards selection?

A

Forward: that we do not see if we have some variables that work very well together to find an optimal model, because we are always looking at them one at a time.
Backwards: This method will show us if there’s any predictors that work well together but not apart for example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is shrinkage/regularization methods?

A

Making penalty based on the complexity of the model. We fit the model containing all p predictors but constraining the coefficient estimates towards 0. Shrinking the coefficient estimates can significantly reduce their variance. So basically we have all predictors but we are lowering the impact of less important variables by constraining the coefficient estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Lasso?

A

shrinkage/regularization method that shrinks coefficients to 0. Effectively removing variables for autoselection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is ridge?

A

shrinkage/regularization method that shrinks coefficients towards, but never reaching, 0.
Shrinks coefficients to make variables less important but never removing them entirely.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use lasso over ridge?

A

Lasso produces simpler and more interpretable models. Predictive performance depends on the data. If many variables with no (independent) association to the response, lasso will work better than ridge. If not, ridge would work better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is does lambda refer to in shrinkage methods?

A

The scaling factor to fit the penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A