Multiple Regression - Model Selection Flashcards

Question 1

Q

How does a simultaneous regression model differ from hierarchical regression?

Answer

A

Simultaneous regression - all predictors are added at the SAME time (in a single step)

This means that the 1st variable we enter soaks up all of the variability – so if we entered physical health first, it soaked up the variability, leaving the left-overs for mental health. When we remove physical health, mental health all of a sudden is significant.

In hierarchical regression, predictors that are most important are entered first, then step 2 will include the first and 2nd; etc.

Question 2

Q

What affects the attribution of shared variability in simultaneous and hierarchical regression?

Answer

A

The way the variables are entered (in their order).

Question 3

Q

Why would we use the hierarchical model?

Answer

A

If we want to see the effects over and above a covariate - we will enter the uninteresting predictors first.

ex)If we add SES (covariate) in the model first, we can see what we’re actually interested in, after controlling for SES.

Question 4

Q

If there’s no theory about predictors, what do we do?

Answer

A

We run all regressions!

We first evaluate all possible regressions, or employ a selection algorithm.

Question 5

Q

If we compare all possible regressions (2 to the k) involving a set of predictors in a search for the “best” model, on what criteria might we compare the models? (5)

Answer

A

Compare models to R² (largest)- problem is Inflation (model with more predictors always looks better)
Adj R2 (largest)= We are penalized by adding predictors that aren’t worth their weight in df.
Press (Smallest)- Prediction sum of squares deleting 1 residual, computing new sum of squares…
Mallow CP (smallest)- Addresses the error of the fitted values and predictor after removal of residuals. The expected value of Mallow CP is 1 + the predicted value we have.
Parsimony - Less predictors the better

Question 6

Q

What does a significant R2∆ change indicate in a hierarchical regression output?

ex)
Step 1 R2∆ is significant.

Step 2 R2∆ is not significant.

Step 3 R2∆ is significant.

Answer

A

Step 1, or the physical health contributes to the model significantly

Step 2, or physical and mental health does not contribute to the model significantly

Step 3, or physical, mental health and stress contributes to the model significantly

Question 7

Q

In regards to degrees of freedom 2 (df2) in the hierarchical r output, how does it go up and down?

Answer

A

As we add more predictors, the denominator DF (n-k-1) goes down.

Question 8

Q

As we add more predictors to a model, what happens to the R2?

Answer

A

The R2 value inflates- making it look like we have more effect when we probably don’t.

Question 9

Q

What does selection algorithms do?

Answer

A

We can compare 2 predictors each time, instead of using simultaneous or hierarchical regression… this way, we don’t inflate the R2 value and we might actually see where the differences are without any inflation.

Question 10

Q

What’s the formula for Adjusted R2?

Answer

A

1 - (n-1)(1-R2) ÷ n-k-1

or by doing

SSres ÷ SStotal x (dfTotal ÷ dfRes)

Question 11

Q

What is the major disadvantage to the all possible regression approach?

Answer

A

It is time-consuming. Evaluating 32 models by just having 5 predictors (4,096 times!)… is daunting. The difference between R2 and adjusted R2 would be very large.

Question 12

Q

What is the function of the Model Selection Algorithm?

What does R use to compute selection algorithms?

What are the 3 types of getting the AIC?

Answer

A

It doesn’t compute all possible models-uses just an algorithm to develop a single “best” model.

R uses AIC

3 Types:

Forward selection
Backward elimination
Stepwise

Question 13

Q

What is the algorithm that R uses to compute the model selection algorithm?

Answer

A

It uses the AIC, or the Akaike Information Criterion.

Question 14

Q

Why is AIC referred to as a measure of relative goodness of fit of a model?

Answer

A

AIC is a measure of model fit that can only be compared to the RELATIVE goodness of fit of a model - it is not a free-standing measure of model fit. Only when it’s compared to another model.

Question 15

Q

What’s the difference between R² and AIC?

Answer

A

R² is an absolute model fit value, while AIC can only be used to compare it to other models with the same outcome variable.

Question 16

Q

What’s the benefit of using AIC?

Answer

Study These Flashcards

A

It’s a trade-off between model complexity and accuracy (2k, or 2 times the model predictors).

It penalizes us for adding predictors that don’t improve our model.

Question 17

Q

Which is the preferred model in regards to AIC?

Answer

Study These Flashcards

A

Preferred model has the smallest AIC - it’s the most accurate because the larger value of AIC indicates worse fit, corrected for the number of variables.

Question 18

Q

How do the 3 types of model selection algorithms differ?

Answer

Study These Flashcards

A

They differ in the way the variables are added and removed to the model.

Question 19

Q

Describe forward selection.

Answer

Study These Flashcards

A

Forward selection – start with the smallest model (usually Empty or Covariate), then r determines which model adding one predictor at a time produces the model with the smallest AIC.

Question 20

Q

Describe backwards elimination.

Answer

Study These Flashcards

A

Starts with full model that includes all predictors. Shows which predictor can be removed to produce the smallest AIC.

Question 21

Q

What’s the problems with the forward selection and backward elimination algorithms?

Answer

Study These Flashcards

A

Due to shared variability, a previously added or removed predictor could become a bad good or bad predictor.

The problem is that good or bad predictors cannot be added or removed from these models because for forward selection, we can only add, and backward elimination, we can only remove.

Question 22

Q

Describe Stepwise selection.

Answer

Study These Flashcards

A

Stepwise combats the limitations of forward and backwards by removing and adding predictors that will result in the model with the lowest AIC.

Question 23

Q

How do we address the model selection algorithms and its capitalization on chance?

What do I mean by capitalization on chance?

Answer

Study These Flashcards

A

All of the model selection algorithms (AIC) are all sample-based estimates of R², therefore it overestimates population R². Adjusted R² addresses the capitalization on chance by indicating the loss of predictive power or shrinkage.

Shrinkage:
We can address this issue by applying the prediction equation developed in 1 sample to a second sample which will result in a smaller R² (calculated as the squared correlation bw Y and y-hat), providing a better estimate of the population.

Question 24

Q

How can we adjust for Shrinkage?

Answer

Study These Flashcards

A

Cross-validate- collect data from 2nd sample to see if it matches
Double cross-validate: apply estimate model from old sample to new and new to old.
Data Splitting – split a large dataset in half and cross-validate within the dataset

Multiple Regression - Model Selection Flashcards

(24 cards)