Cross-validation Flashcards

Question 1

Q

Prediction accuracy estimation

Answer

A

“Two problems present themselves: how to construct an effective
prediction rule, and how to estimate the accuracy of its predictions.

Prediction, perhaps because of its model-free nature,
is an area where algorithmic developments have run far ahead of their in-
ferential justification.

Quantifying the prediction error of a rule rd.x/requires specification of
the discrepancy D(y, y-hat) between a prediction y-hat and the actual response y.

The two most common choices are squared error and classification error.”

Question 2

Q

True error rate

Answer

A

“Ideally, we would measure the true error rate, namely the error rate corresponding to observations from the true prob. distribution F.
I.e. it is the expected disrecpancy of y-hat and y given an unseen pair drawn from true prob. distribution F:”

Question 3

Q

Cross-validation

Answer

A

“We want to estimate the true error - but using the apparent error is biased because the prediciotn rule has been adjusted to fit the observed responses (overfitting). Instead, we introduce a validation set with unseen data to provide an unbiased estimate of Err:

”

Question 4

Q

Leave-One-Out Cross-Validation

Answer

A

“Cross-validation attempts to mimic Err-val without the need for a validation set.
We leave out one sample (or a group of samples) and compute the average disrepancy.

One can leave out several pairs at a time!

The book has two examples of cross-val error higher than training error.

CV-error slightly higher than training error (4-8%):

Does err-CV actually estimate true error?
Simulation using a true error reveals that there is a negative correlation between CV-error and true error. Large values of CV-err go with smaller values of true pred. error and vice versa.
“

Question 5

Q

Covariance penalties

Answer

A

“Cross-validation does its work nonparametrically and without the need for
probabilistic modeling. Covariance penalty procedures require probability
models, but within their ambit they provide less noisy estimates of predic-
tion error.

Purpose:

Correct the bias of training error as a predictor of test error.
Provide more accurate estimates of out-of-sample prediction error.
Key Idea:

Prediction error is underestimated in training because the model is tailored to the observed data.
Covariance penalties account for the overfitting by incorporating the dependence between the data used for fitting and the predictions.”

Question 6

Q

Mallows’ Cp estimate of prediction error (linear prediction model)

Answer

A

“trace(M) = p in a linear regression model penalizes the number of parameters.

This definition provides common ground for comparing different types of
regression rules. Rules with larger df are more flexible and tend toward
better apparent fits to the data, but require bigger covariance penalties for
fair comparison”

Question 7

Q

Covariance penalties: Lasso

Answer

A

“For Lasso problems, setting p to the number of nonzero reggression coefficients yields a good approximation.
This allows us to use the previous OLS covariance penalty formula for the overall prediction error.”

Question 8

Q

Covariance penalties: SURE (Stein’s unbiased risk estimator)

Answer

A

If we are willing to add multivariate normality to model here we do not need the assumption of linearity

Question 9

Q

Covariance Penalties: Parametric Bootstrap

Answer

A

“The advantage of parametric bootstrap estimates (12.64) of covariance
penalties is their applicability to any prediction rule no matter how exotic.”

Question 10

Q

Covariance Penalties: Classification

Question 11

Q

Covariance Penalties: Akaike Information Criterion

Answer

A

“The term in brackets is the Akaike information criterion (AIC): if the
statistician is comparing possible prediction rules r.j/.y/for a given data
set y, the AIC says to select the rule maximizing the penalized maximum
likelihood (max likelihood penalized by number of parameters)

The Akaike Information Criterion (AIC) is a measure used for model selection in statistics. It quantifies the trade-off between the goodness of fit of a statistical model and the complexity of the model. Lower AIC values indicate a better model.”

Question 12

Q

Training, Validation, and Ephemeral Predictors (time split of train and val

Answer

A

“Good Practice suggests splitting the full set of observed predictor–response
pairs .x;y/ into a training set d of size N (12.1), and a validation set
dval, of size Nval (12.19). The validation set is put into a vault while the
training set is used to develop an effective prediction rule rd.x/. Finally,
dval is removed from the vault and used to calculatec
Errval (12.20), an honest estimate of the predictive error rate of rd

This is a good idea, and seems foolproof, at least if one has enough data
to afford setting aside a substantial portion for a validation set during the
training process

T here remains some peril of underestimating the true error rate, arising from ephemeral predictors, those whose predictive powers fade away over time.

A notorious cautionary tale of fading correlations concerns Google Flu
Trends”

Question 13

Q