Week 5 Flashcards

1
Q

Cp, AIC, BIC, and Adjusted R2

A

• These techniques adjust the training error for the model size, and can be used to select among a set of models with different numbers of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mallow’s Cp

A

where d is the total # of parameters used and ˆσ2 is an
estimate of the variance of the error associated with each response measurement
Cp = 1/n (RSS + 2dσˆ2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

d

A

total # of parameters used and ˆσ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

AIC

A

a large class of models fit

AIC = −2 log L + 2 · d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

L

A

he maximized value of the likelihood function

for the estimated model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gaussian errors

A

maximum likelihood and least squares are the same thing, and Cp and AIC are equivalent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

BIC

A

will tend to take on a small value for a model with a low test error, and so generally we select the model that has the lowest BIC value
1/n (RSS + log(n)dσˆ2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Adjusted R2 =

A

a model with a low test error, a large value of adjusted R2 indicates a model with a small test error
1 − [RSS/(n − d − 1)]/[TSS/(n − 1)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validation and Cross-Validation

A

• Each of the procedures returns a sequence of models Mk indexed by model size k = 0, 1, 2, . . .. Our job here is to select ˆk. Once selected, we will return model Mkˆ
• We compute the validation set error or the cross-validation error for each model Mk under consideration, and then select the k for which the resulting estimated test error is smallest.
doesn’t require an estimate of the error
variance σ2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ridge regression

A

coefficient estimates βˆR

RSS + λΣβ2j

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

λ

A

≥ 0 is a tuning parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

shrinkage penalty

A

is small when β1, . . . , βp are close to zero, and so it

has the effect of shrinking the estimates of βj towards zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

||β||2

A

denotes the `2 norm (pronounced “ell

2”) of a vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

scaling of predictors

A

In other words, regardless of how the jth predictor is scaled, Xjβˆ j will remain the same. Therefore, it is best to apply ridge regression after standardizing the predictors,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

standardizing the predictors

A

x˜ij = xij /√ 1/nΣ(xij − xj )2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Lasso

A

is a relatively recent alternative to ridge

regression that overcomes this disadvantage, much like best subset selection, the lasso performs variable selection.