Model Selection Flashcards
Cochran’s theorem ?
If H_0: all group means are equal, is true
Then
A F can be formed from:
(SSB) group sum of squares, (SSW) within-group sum of squares
F= (SSB/df_between)/(SSW/df_within)
When to transform observations?
If model checking suggests variance is not constant
Commonly used transformations of observations
And 1/y
Box-Cox transformations
This estimates the λ that minimizes sd of standardised transformed variable
First transformations of observation to try?
ln (y)
Important thing to remember when transforming observations
All y_i must be >0
If all other transformations fail, try?
Trig functions, in particular:
Sin^-1 or Tan^-1
F test for deletion of subset of variables:
Extra sum of squares?
Where β_q,…, β_p-1 are the variables being potentially removed
F test for deletion of subset of variables:
How to separate variables in vectors
F test for deletion of subset of variables:
Find SS_extra in vectors
F test for deletion of subset of variables:
Null hypothesis? H_1?
Where β_q,…, β_p-1 are variables to be removed
F test for deletion of subset of variables:
Form F test stat and reject H_0 at α level
When to use all subsets regression
If there is no natural ordering to explanatory variables
Given p-1 expl variables, how many possible models are there?
2^(p-1)
Usual statistics used to compare models?
MS_E
R^2
C_p
MS_E is?
Residual mean square
For full model E(MS_E) = ? And why?
σ^2
Because all candidate explanatory variables are considered
How to model test with MS_E
R^2 is?
Coefficient of determination
Adjusted R^2?
Adding terms to a model has what effect on R^2
Always increases
How to determine # for p in R^2
Plot R^2_(p^~) against p^~ to determine where plot levels off
C_p
Mallows statistic
E(SS^(p^~)_E) =
Use mallows stat to estimate MSE of prediction
Testing With malllows stat we should choose either
C_(p^~) depends on
Unknown σ^2
Estimator of mallows stat
Take MS_E from full model as estimator of σ^2
Expectation of estimator of mallows stat
Adjusted C_(p^~)
Taken from estimator and expectation of estimator
When calculating original R^2 to compare predR^2 to, how to get original?
R^2 = 1 - (SS_R)/(SS_T)
Where SS_R is Sum of Squared Residuals
And SS_T is total sum of squares
Original is also called Multiple R-squared