Week 4 Flashcards
Define Multicollinearity
High correlation between at least two independent variables.
When a (multiple) regression model has a multicollinearity issue, what happens? (3)
The goal of a multi-regression model is to measure the marginal effects of one independent variable on the dependent variable following the ceteris paribus assumption (all other variables remaining constant).
When 2 independent variables are highly correlated (such as age and experience), this also means that they move together.
This means there is no opportunity to disentangle their effects. The individual effect becomes obscured.
How do you detect a multicollinearity problem? (2)
- Check correlation coefficients: Use a correlation matrix for all independent variables (to detect if there is a correlation between independent variables)
○ If the correlation coefficients > 0.7 –>
Signal multicollinearity (more cautious:
0.5) - Variance inflation factor (VIF)
○ Measures the linear association between IV and all the other IV’s.
Describe the Variance Inflation Factor (VIF)
Variance inflation factor (VIF):
Measures the linear association between IV and all the other IV’s:
§ Quantifies the severity of multicollinearity
§ VIF varies takes a value of 1 and above (no upper limit)
§ VIF value shows the percentage the variance, inflated for each coefficient
–> i.e. VIF of 1.7 -> Variance is 70% bigger in the data, compared to no multicollinearity.
How do you detect multicollinearity with a correlation matrix?
○ If the correlation coefficients > 0.7 –> Signal multicollinearity (more cautious: 0.5)
How do you detect multicollinearity with the results from the Variance Inflation Factor (VIF)? (3)
- No Multicollinearity:
- VIF = 1 –> No correlation between a given IV and other IV’s in the model.- Moderate Multicollinearity:
- VIF between 1 and 5 –> Not severe, no need to pay special attention.
- Severe Multicollinearity:
a. Cautious: VIF > 5
b. Less cautious: VIF > 10
Multicollinearity is likely a problem in the regression model and can effect your estimates.
- Moderate Multicollinearity:
How can you deal with multicollinearity? (3)
- Increase sample size - however, may not be feasible
- Drop one of the variables that causes the problem
- First estimate the model with one variable, then with the other –> Important to consider how dropping the variable may impact the study.
- Transform the highly correlated IVs
- Log transformation (if collinear relationship is non-linear or exponential)
○ Can be checked with VIF before and after - Create a composite variable, combine collinear IVs.
- Log transformation (if collinear relationship is non-linear or exponential)
Describe what a composite variable is:
- A composite variable is one which is made of two or more variables which are highly correlated conceptually or statistically.
OLS assumes homoscedasticity. Define homoscedasticity (2)
- Variance of the error term is constant over various values of the IVs.
- Dispersion of the error remains the same over the range of observations
Define heteroscedasticity (3)
- The error term does not have a constant variance.
- Variance changes in response to a change in the value of IV’s.
- Dispersion of the error changes over the range of observations.
What are some issues when using OLS with heteroscedasticity? (4)
- OLS assumption (of constant error variance) violated.
- Biased standard errors
- Unreliable t-statistics
- Unreliable significance tests
–> misleading conclusions about significance.
How can we detect heteroscedasticity? (3)
- Breusch-Pagan test
- White test
- Scatterplot of residuals (for each independent variable)
How can you deal with heteroscedasticity? (3)
- Transform the dependent variable (doesn’t always work)
- Use weighted regression
- Each observation is weighted
- Observations with a higher variance get a lower weight in determining the regression coefficients.
- Use ‘robust; standard errors
- Adjusts the OLS standard errors for heteroscedasticity.
What is Reverse Causality?
When pursuing a study, we usually assume that changes in the dependent variable are caused by changes in the
independent variable(s). However, reverse causality occurs when the dependent variable also causes a change in the independent variable.
(This is a form of endogeneity)
How can you account for Reverse Causality? (5)
- Have a model that is well-grounded in theory
- Explain the mechanism with strong reasoning behind the “how” and “why”
- Acknowledge possible endogeneity issues
- Use lagged independent variables
- Advanced econometric techniques to mitigate endogeneity.