L5: Resampling Methods Flashcards

Question 1

Q

Explain the cross validation approach

Answer

A

Question 2

Q

What are some drawbacks of CV?

Answer

A

The validation estimate of the test error rate can be highly variable, as this is generally a smaller dataset.
The model will be trained on fewer data points and hence perform worse than if it were trained on the whole dataset.

Question 3

Q

Explain Leave-One-Out Cross-Validation

Answer

A

This is where the test data set consists of only one data point.

We train the data on the n-1 data set and repeat this n times, until all data points have been used as the validation set.

The LOOCV estimate for the test MSE is then the average of all test errors.

Question 4

Q

Explain k-fold CV

Answer

A

This involves splitting the dataset into K subsets, and utilising only one of the subsets at the test dataset.

Then this is repeated K times until all K subsets have been used as the test.

The test error is averaged from the K MSE estimates.

K is typically 5 or 10

Question 5

Q

Where is the best model complexity, ABC, based on the train and testing error? Why?

Answer

A

A. We want the testing error to be as low as possible. This is where the model is generalised well to new unseen data.

Question 6

Q

What is the bootstrapping method?

Answer

A

A sample from population with sample size n.
Draw a sample from the original sample data with replacement with size n, and replicate B times, each re-sampled sample is called a Bootstrap Sample, and there will totally B Bootstrap Samples.
Evaluate the statistic of θ for each Bootstrap Sample, and there will be totally B estimates of θ.
Construct a sampling distribution with these B Bootstrap statistics and use it to make further statistical inference, such as:

- Understand the principle of CV and Bootstrap - Apply cross-validation methods to estimate the test error associated with the learning method, and improving the estimates. - Apply the bootstrap to quantifying the uncertainty associated with a given estimate or a learning model (6 cards)