GLM model selection Flashcards

Question 1

Q

Model selection criterions

Answer

A

Parsimony
Accuracy
In case of mispecification errors:
Excluding relevant regressors = bias in β^
Including irrelevant regressors = variability in β^

Question 2

Q

Model selection in nested models

Answer

A

σ² known: lr = (SSE_B - SSE_A) / σ² | model B ~ χ²_q=|A|-|B|
σ² unknown: F = (n-|A|)/q * SSE_B - SSE_A) / SSE_A = (n-|A|)/q * ( R²_A - R²_B ) / (1-R²_A) | model B ~ F_{q, n-|A|}

Question 3

Q

Model selection in non-nested models

Answer

A

Adjusted R²
Cross validation (CV)
Akaike Information Criterion (AIC)
Bayesian Informatino Criterion (BIC)

Question 4

Q

Adjusted R²

Answer

A

R^_2 = 1 - (n-1)/(n-p) (1-R²)
It can be less than 0 if many irrelevant regressors are added, it is concordant with R² if |A|=|B|
BEST = MAX

Question 5

Q

Cross validation+deleted residuals

Answer

A

CV = PRESS = Σ (y_i - y^_iM^-i)² / n
Deleted residuals y_i - y^_iM^-i = ε^_i / 1 - h_iiM
h_iiM is the i-th diagonal element of the hat matrix H = X(X’X)^-1X’
BEST = MIN

Question 6

Q

Akaike Information Criterion

Answer

A

AIC = -2l(θ^)+2(p+1) =GLM= nln(σ^²_ML) + 2(p+1)
We choose the model that contains the element most similar to f₀
Under regularity assumptions can select the “best” model even if there is no test to evaluate it.
In GLMs if |A|=|B| we have AIC min ↔ SSE min ↔ R² max
BEST = MIN

Question 7

Q

Bayesian Information Criterion

Answer

A

BIC = -2l(θ^) + ln(n)(|M|+1) =GLM= nln(σ^²_ML) + ln(n)(|M|+1)
Puts more weight on the complexity than AIC, under regularity assumptions can select the “best” model even if there is no test to evaluate it.
In GLMs if |A|=|B| we have BIC min ↔ SSE min ↔ R² max
BEST = MIN

Question 8

Q

Nested GLM model selection in R

Answer

A

anova(fitmin, fitmax)
- Res.Df = n - p_i ∀ i

Question 9

Q

Non-nested GLM model selection in R

Answer

A

AIC(fit) #Simplified for GLMs, includes n+nln(2π)
extractAIC(fit)
extractAIC(fit, k=log(nrow(data))) #BIC