Variance Partitioning and Model Selection Flashcards
• How does a hierarchical analysis differ from a simultaneous analysis? How is shared variance attributed? How would you present the results of the two types of analyses?
o Simultaneous regression - all predictors are added at the SAME time (in a single step)
• This means that the 1st variable we enter soaks up all of the variability – so if we entered physical health first, it soaked up the variability, leaving the left-overs for mental health. When we remove physical health, mental health all of a sudden is significant.
o Hierarchical regression – predictors are added in a series of steps based on theory, precedence, or the role it plays in the analysis.
o Results: Table
• If we compare all possible regressions (2 to the k) involving a set of predictors in a search for the “best” model, on what criteria might we compare the models?
o We compare them to the following:
• 1. R² (largest) - problem is Inflation (model with more predictors always looks better)
- Adj R2 (largest)= We are penalized by adding predictors that aren’t worth their weight in df.
- Press (small) - Prediction sum of squares deleting 1 residual, computing new sum of squares…
- Mallow CP (small)- Bootstrapping technique (taking 1 predictor in and out each time) that addresses the error of the fitted values and predictor after removal of residuals. The expected value of Mallow CP is 1 + the predicted value we have.
- Parsimony – Fewest # of predictors.
• What is the major disadvantage to the all possible regression approach?
o It is time-consuming. Evaluating 32 models by just having 5 predictors (4,096 times!)… is daunting. The difference between R2 and adjusted R2 would be very large.
• Describe forward selection, backward elimination, and stepwise variable selection procedures. Will these different procedures always converge on a single “best” model?
o Forward Selection (+ )– Start with the smallest model (usually Empty or Covariate), then r determines which model adding one predictor at a time produces the model with the lowest AIC.
o Backward Elimination ( - )- Starts with full model that includes all predictors. Shows which predictor can be removed to produce the smallest AIC.
o Stepwise (+ & - )- Stepwise combats the limitations of forward and backwards by toggling, or removing and adding predictors that will result in the model with the lowest AIC.
• These procedures may not always converge on the single best model because of the limitations – forward selection can only add (not remove a bad variable) and backward elimination can only remove (not add a good variable). Although stepwise is the “best”, it capitalizes on chance because the AIC values are sample-based estimates of R² - so it’s not a true reflection to the population.
• How do regression models capitalize on chance? What is shrinkage? What can we do to adjust for the difficulties responsible for the shrinkage phenomenon?
o They capitalize on chance because sample-based estimates of R² is an overestimate of the population R². So we can address this issue by applying the prediction equation developed in 1 sample to a second sample which will result in a smaller R² (calculated as the squared correlation bw Y and y-hat), providing a better estimate of the population.
o Shrinkage – Reduces our likelihood of obtaining an R² effect.
• We can adjust for Shrinkage by doing the following:
• Cross-validate- collect data from 2nd sample to see if it matches
• Double cross-validate: apply estimate model from old sample to new and new to old.
• Data Splitting – split a large dataset in half and cross-validate within the dataset