Week 5 Flashcards

Question 1

Q

Cp, AIC, BIC, and Adjusted R2

Answer

A

• These techniques adjust the training error for the model size, and can be used to select among a set of models with different numbers of variables.

Question 2

Q

Mallow’s Cp

Answer

A

where d is the total # of parameters used and ˆσ2 is an
estimate of the variance of the error associated with each response measurement
Cp = 1/n (RSS + 2dσˆ2)

Question 3

Q

d

Answer

A

total # of parameters used and ˆσ

Question 4

Q

AIC

Answer

A

a large class of models fit

AIC = −2 log L + 2 · d

Question 5

Q

L

Answer

A

he maximized value of the likelihood function

for the estimated model.

Question 6

Q

Gaussian errors

Answer

A

maximum likelihood and least squares are the same thing, and Cp and AIC are equivalent.

Question 7

Q

BIC

Answer

A

will tend to take on a small value for a model with a low test error, and so generally we select the model that has the lowest BIC value
1/n (RSS + log(n)dσˆ2)

Question 8

Q

Adjusted R2 =

Answer

A

a model with a low test error, a large value of adjusted R2 indicates a model with a small test error
1 − [RSS/(n − d − 1)]/[TSS/(n − 1)]

Question 9

Q

Validation and Cross-Validation

Answer

A

• Each of the procedures returns a sequence of models Mk indexed by model size k = 0, 1, 2, . . .. Our job here is to select ˆk. Once selected, we will return model Mkˆ
• We compute the validation set error or the cross-validation error for each model Mk under consideration, and then select the k for which the resulting estimated test error is smallest.
doesn’t require an estimate of the error
variance σ2

Question 10

Q

ridge regression

Answer

A

coefficient estimates βˆR

RSS + λΣβ2j

Question 11

Q

λ

Answer

A

≥ 0 is a tuning parameter

Question 12

Q

shrinkage penalty

Answer

A

is small when β1, . . . , βp are close to zero, and so it

has the effect of shrinking the estimates of βj towards zero

Question 13

Q

||β||2

Answer

A

denotes the `2 norm (pronounced “ell

2”) of a vector

Question 14

Q

scaling of predictors

Answer

A

In other words, regardless of how the jth predictor is scaled, Xjβˆ j will remain the same. Therefore, it is best to apply ridge regression after standardizing the predictors,

Question 15

Q

standardizing the predictors

Answer

A

x˜ij = xij /√ 1/nΣ(xij − xj )2

Question 16

Q

Lasso

Answer

Study These Flashcards

A

is a relatively recent alternative to ridge

regression that overcomes this disadvantage, much like best subset selection, the lasso performs variable selection.

Week 5 Flashcards

(16 cards)