Chapter 5 - Cross Validation Flashcards

1
Q

Validation Set Approach

A

split into two parts, estimate the test error for a supervised learning method. Do this splitting many times, average error over many different splits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LOOCV

A

leave one out cross validation. un-even splits. hold out one observation, and compute test error on that point. repeat for every observation, and average the errors.

PROBLEM: this is expensive, because you have to run the model n times.

SOLUTION: luckily for linear regression, there is a solution. only run it once and do RSS over leverage statistic ( 1/n sum(RSS/(1-Hii)))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

k-fold CV

A

split the data into k portions, compute the test error on the ith fold, average the test errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

compare LOOCV with k-fold CV

A

k-fold CV depends on the split, and we are training the model on less data than is available, this introduces bias into the model (test error is higher than avg). in LOOCV the training samples are all very similar, so this increases the variance of the test error estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Choosing an optimal model

A

even if the error estimates are off, choosing minimum CV error often leads to minimum test error. in classification problems, things look pretty similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The one-standard error rule

A

forward stepwise selection, how many variables should we include in the model? This is the error bars diagram. Choose the simplest model whose CV error is no more than 1 st dev above the model with the lowest cv error (so if minimum is 10, choose 9 or below).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The wrong way to do CV

A

wrong: select 20 most important predictors using z test first. do 10 fold CV and logistic regression. calculated CV error is 3% (should be ~50%). we do 10 fold CV only using predictors that we know are correlated.
right: do the variable selection after you have selected the k-folds. also fit the model each time

Every aspect of the model that involves the data must be cross validated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The learning curve and choosing k

A

learning curve: the performance of a particular learning method. shape is f(data type, method).

In k-fold CV, as we increase k, we decrease bias but we increase the variance of the CV error. (5 fold CV has little bias on a dataset of 200, test error of n=200 is similar to test error of n=160). the best for bias is LOOCV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CV vs bootstrap

A

CV gives the estimate of the test error.

BS gives the std. error of the estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard Errors in Linear Regression (classical assumptions)

A

assume x1….xn is normally distributed. assume true variance is close to sigma_hat^2 and true mean is close to x_bar^2. then the SD of this sampling distribution is the standard error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Limits of the classical approach

A

if x1,xn are not normal, if the estimator doesn’t have a simple form,

SOLUTION: Bootstrap!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bootstrap Standard error

A

is the standard deviation of the bootstrapped estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why do we sample with replacement?

A

sampling with replacement is non-parametric bootstrap (used in supervised learning methods). we sample with replacement so that each model is independent from one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly