3A Flashcards
What is a spurious correlation?
A correlation between two variables (X and Y) that appears significant but is actually caused by a third variable (Z). X does not truly cause Y.
Example of a spurious correlation?
Watching GTST and support for redistribution. The actual cause might be gender or income.
What is a suppressor variable?
A variable that hides or weakens a real relationship between X and Y by affecting them in opposite directions.
Example of a suppressor variable?
Income and left-right identification. Education may suppress the expected relationship between them.
What is multiple regression?
A statistical technique used to predict a dependent variable (Y) based on multiple independent variables (X’s).
How does multiple regression differ from simple regression?
Simple regression has one X variable, while multiple regression includes several X variables to control for confounders.
Why use multiple regression?
It helps isolate the effect of each X variable by controlling for others, reducing omitted variable bias.
What is Ordinary Least Squares (OLS)?
A method to estimate regression coefficients by minimizing the sum of squared residuals.
How does OLS work?
It finds the best-fitting line by minimizing the difference between predicted and actual Y values.
Why should you include control variables?
To avoid omitted variable bias and improve the accuracy of estimated effects.
What happens if you include too many variables?
It can cause interpretation issues, overfitting, multicollinearity, and may accidentally remove a real effect.
What is multicollinearity?
When two or more independent variables are highly correlated, making it difficult to separate their effects.
What is a mediator variable?
A variable that explains the relationship between X and Y
(e.g., income → redistribution attitudes → left-right identification).
How do you decide which variables to include?
1) If they reduce omitted variable bias.
2) If they are strong predictors of Y.
Avoid irrelevant, overfitting, or multicollinear variables.
What are the two stepwise model specification methods?
1) Start with few variables and add more gradually.
2) Start with many and remove non-significant ones.
What are the strengths of multiple regression?
- Controls for confounders.
- Helps distinguish real vs. spurious relationships.
- Can be used to test causal claims.
What are the limitations of multiple regression?
- Can’t control for all variables.
- Correlation still does not imply causation.
- Requires careful variable selection.