Part II: Resampling + bias/variance Flashcards

Question 1

Q

What is variance

Answer

A

The difference between fits on different datasets

Question 2

Q

What is bias

Answer

A

The general fit of the data (Least squares)

Question 3

Q

Does overfitting have high variance or bias?

Answer

A

Variance. Introduces noise to the dataset

Question 4

Q

Does underfitting have high variance or bias?

Answer

A

High bias, the model does not learn the relationship between the predicted and actual values

Question 5

Q

What is the minimum number of observations?

Answer

A

dimensions = observations
Basically if we go up in dimensions we need to go exponentially up in observations to have the same flexibility. Therefore we prefer less variables.

Question 6

Q

Which resampling method can be used?

Answer

A

Leave 1 out cross validation
k-fold cross validation
bootstrapping

Question 7

Q

What is Leave 1 out cross validation

Answer

A

So we are splitting the data n-1 times where each time we remove 1 observation, do train on the rest, and test with the one observation.

Question 8

Q

what is k fold cross validation?

Answer

A

We make k folds (groups). Every group will act as test set once and training sets the remaining time. It’s less computationally expensive, and the results are pretty much as good as leave one out. Often K=5 or K=10 is used.

Question 9

Q

What is stepwise selection?

Answer

A

Forward: Starting from an empty model and then adds one predictor at a time that improves the model the most (aka test all predictors one at a time and keep the one that improves the model most).
Backwards: Start from a full model with all predictors, and then drop them one at a time.

Question 10

Q

What is the drawback from forward and backwards selection?

Answer

A

Forward: that we do not see if we have some variables that work very well together to find an optimal model, because we are always looking at them one at a time.
Backwards: This method will show us if there’s any predictors that work well together but not apart for example.

Question 11

Q

What is shrinkage/regularization methods?

Answer

A

Making penalty based on the complexity of the model. We fit the model containing all p predictors but constraining the coefficient estimates towards 0. Shrinking the coefficient estimates can significantly reduce their variance. So basically we have all predictors but we are lowering the impact of less important variables by constraining the coefficient estimates.

Question 12

Q

What is Lasso?

Answer

A

shrinkage/regularization method that shrinks coefficients to 0. Effectively removing variables for autoselection.

Question 13

Q

What is ridge?

Answer

A

shrinkage/regularization method that shrinks coefficients towards, but never reaching, 0.
Shrinks coefficients to make variables less important but never removing them entirely.

Question 14

Q

When to use lasso over ridge?

Answer

A

Lasso produces simpler and more interpretable models. Predictive performance depends on the data. If many variables with no (independent) association to the response, lasso will work better than ridge. If not, ridge would work better.

Question 15

Q

What is does lambda refer to in shrinkage methods?

Answer

A

The scaling factor to fit the penalty

Question 16

Q

Answer

Study These Flashcards

A

Part II: Resampling + bias/variance Flashcards

(16 cards)