Lecture 4 Flashcards
4 things to mention when interpreting a coefficient
- Significance
- Sign
- Size
- Ceteris paribus
Stepwise regression
Step by step include/ exclude variables. If there are a lot of IVs, you can use stepwise regression to find the good models and to find what variables to use.
Forward - Add variables in every model
Backward - Start with all variables and take them out step by step.
However, this is a bad practice, since you have to start from theory and not randomly look for variables.
Multicollinearity
High correlation between 2 IVs. It makes it hard to identify the effect of x on y.
It also influences the result of the estimation and the assumption of the model is that all the IVs have a separate influence
How can we check wether we have multicollinearity?
- Check the data before estimation
- Correlation coefficients matrix, correlation coefficients around or above 0,7-0,8 signal multicollinearity
- Variance inflation factor (VIF)
How to measure Variance inflation factor (VIF)
VIF for the IV(xk)
VIFk = 1/ (1-R2k)
Usually values >10 indicate multicollinearity
Conservative threshold is >5
How can you solve multicollinearity?
- Increase sample size
- Drop one of the variables (Robustness check: first estimate the model with one variable, then with the other. Do you get the same results?)
- Transform the highly correlated IVs (e.g. create a composite variable, combine collinear IVs)
Homoscedacity
- OLS assumption
- Variance of the error term is constant over various values of the IV
- Dispersion of the error remains the same over the range of observations
Heteroscedacity
- OLS assumption does not hold
- Error term does not have a constant variance
- Dispersion of the error changes over the range of observations
Why do you have heteroscadacity?
Groups of observations are different, follow different processes, so have different error terms
What tests to use to test for heteroscadacity
- Breusch- Pagan test
- White test
Graphical inspection for heteroscadacity
- Make a scatterplot of the IV and residuals of regression
- You want to see most scores concentrated around the centre
How can we solve heteroscadacity
- Weighted least squares
- Each observation is weighted
- Observations with a higher variance get a lower weight in determining the coefficients - Calculate robust standard errors
What does the mean represent in the case of dummy variables?
Percentage split between the categories, average in the proportion