Resampling Methods Flashcards
Validation Set
Split 50/50 Test Training. Most Basic approach
LOOCV
Leave One Out Cross Validation
fit the data to n-1 observations, and repeat n times over the data. This is computationally expensive as you are fitting the model n times. However, will always yield consistent results. Also makes more data available to the model.
K-fold CV
K-fold Cross Validation. This is an alternative to LOOCV where you divide your data into k groups, and use one group as validation, and train data on k-1 groups. Repeat k times. This is the preferred method, as is more realistic estimate of the error.
LOOCV vs. K Fold
K Fold gives a more realistic error as models are not as highly correlated w/eachother. W/LOOCV each model is trained on nearly identical data
K-fold CV error vs Real Test Error
KFCV tends to underestimate the Test Error, but still finds the minimum point when trying to determine model flexibility (i.e. order of polynomials, or k in knn).
Bootstrap
Where you take repeated samples of your dataset, with n observations, with replacement. Replacement means the same observation can repeat itself several times in the dataset. Used to estimate the standard error (variability) in your model output.