Multiple Regression - Model Selection Flashcards
How does a simultaneous regression model differ from hierarchical regression?
Simultaneous regression - all predictors are added at the SAME time (in a single step)
This means that the 1st variable we enter soaks up all of the variability – so if we entered physical health first, it soaked up the variability, leaving the left-overs for mental health. When we remove physical health, mental health all of a sudden is significant.
In hierarchical regression, predictors that are most important are entered first, then step 2 will include the first and 2nd; etc.
What affects the attribution of shared variability in simultaneous and hierarchical regression?
The way the variables are entered (in their order).
Why would we use the hierarchical model?
If we want to see the effects over and above a covariate - we will enter the uninteresting predictors first.
ex)If we add SES (covariate) in the model first, we can see what we’re actually interested in, after controlling for SES.
If there’s no theory about predictors, what do we do?
We run all regressions!
We first evaluate all possible regressions, or employ a selection algorithm.
If we compare all possible regressions (2 to the k) involving a set of predictors in a search for the “best” model, on what criteria might we compare the models? (5)
- Compare models to R² (largest)- problem is Inflation (model with more predictors always looks better)
- Adj R2 (largest)= We are penalized by adding predictors that aren’t worth their weight in df.
- Press (Smallest)- Prediction sum of squares deleting 1 residual, computing new sum of squares…
- Mallow CP (smallest)- Addresses the error of the fitted values and predictor after removal of residuals. The expected value of Mallow CP is 1 + the predicted value we have.
- Parsimony - Less predictors the better
What does a significant R2∆ change indicate in a hierarchical regression output?
ex)
Step 1 R2∆ is significant.
Step 2 R2∆ is not significant.
Step 3 R2∆ is significant.
Step 1, or the physical health contributes to the model significantly
Step 2, or physical and mental health does not contribute to the model significantly
Step 3, or physical, mental health and stress contributes to the model significantly
In regards to degrees of freedom 2 (df2) in the hierarchical r output, how does it go up and down?
As we add more predictors, the denominator DF (n-k-1) goes down.
As we add more predictors to a model, what happens to the R2?
The R2 value inflates- making it look like we have more effect when we probably don’t.
What does selection algorithms do?
We can compare 2 predictors each time, instead of using simultaneous or hierarchical regression… this way, we don’t inflate the R2 value and we might actually see where the differences are without any inflation.
What’s the formula for Adjusted R2?
1 - (n-1)(1-R2) ÷ n-k-1
or by doing
SSres ÷ SStotal x (dfTotal ÷ dfRes)
What is the major disadvantage to the all possible regression approach?
It is time-consuming. Evaluating 32 models by just having 5 predictors (4,096 times!)… is daunting. The difference between R2 and adjusted R2 would be very large.
What is the function of the Model Selection Algorithm?
What does R use to compute selection algorithms?
What are the 3 types of getting the AIC?
It doesn’t compute all possible models-uses just an algorithm to develop a single “best” model.
R uses AIC
3 Types:
- Forward selection
- Backward elimination
- Stepwise
What is the algorithm that R uses to compute the model selection algorithm?
It uses the AIC, or the Akaike Information Criterion.
Why is AIC referred to as a measure of relative goodness of fit of a model?
AIC is a measure of model fit that can only be compared to the RELATIVE goodness of fit of a model - it is not a free-standing measure of model fit. Only when it’s compared to another model.
What’s the difference between R² and AIC?
R² is an absolute model fit value, while AIC can only be used to compare it to other models with the same outcome variable.