Resampling Methods Flashcards

Question 1

Q

Validation Set

Answer

A

Split 50/50 Test Training. Most Basic approach

Question 2

Q

LOOCV

Answer

A

Leave One Out Cross Validation
fit the data to n-1 observations, and repeat n times over the data. This is computationally expensive as you are fitting the model n times. However, will always yield consistent results. Also makes more data available to the model.

Question 3

Q

K-fold CV

Answer

A

K-fold Cross Validation. This is an alternative to LOOCV where you divide your data into k groups, and use one group as validation, and train data on k-1 groups. Repeat k times. This is the preferred method, as is more realistic estimate of the error.

Question 4

Q

LOOCV vs. K Fold

Answer

A

K Fold gives a more realistic error as models are not as highly correlated w/eachother. W/LOOCV each model is trained on nearly identical data

Question 5

Q

K-fold CV error vs Real Test Error

Answer

A

KFCV tends to underestimate the Test Error, but still finds the minimum point when trying to determine model flexibility (i.e. order of polynomials, or k in knn).

Question 6

Q

Bootstrap

Answer

A

Where you take repeated samples of your dataset, with n observations, with replacement. Replacement means the same observation can repeat itself several times in the dataset. Used to estimate the standard error (variability) in your model output.

Resampling Methods Flashcards

(6 cards)