Resampling Methods Flashcards

1
Q

Validation Set

A

Split 50/50 Test Training. Most Basic approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LOOCV

A

Leave One Out Cross Validation
fit the data to n-1 observations, and repeat n times over the data. This is computationally expensive as you are fitting the model n times. However, will always yield consistent results. Also makes more data available to the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

K-fold CV

A

K-fold Cross Validation. This is an alternative to LOOCV where you divide your data into k groups, and use one group as validation, and train data on k-1 groups. Repeat k times. This is the preferred method, as is more realistic estimate of the error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LOOCV vs. K Fold

A

K Fold gives a more realistic error as models are not as highly correlated w/eachother. W/LOOCV each model is trained on nearly identical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-fold CV error vs Real Test Error

A

KFCV tends to underestimate the Test Error, but still finds the minimum point when trying to determine model flexibility (i.e. order of polynomials, or k in knn).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bootstrap

A

Where you take repeated samples of your dataset, with n observations, with replacement. Replacement means the same observation can repeat itself several times in the dataset. Used to estimate the standard error (variability) in your model output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly