Multiple Linear Regression - Estimation Flashcards
Two conditions
When does omitted variable bias occur?
- When the omitted variable is correlated with X
- The omitted variable is a determinant of Y
How does omitted variable bias undermine OLS assumption #1
Since the error term contains all factors other than X that are determinants of Y when there is an omitted variable the conditional expectation of the error is not zero
thus, the assumption of independence fails
How do you find the sign of the Omitted Variable Bias?
sign of OVB = sign(corr(y,u))xsign(corr(X,u))
where, u is the error term
How can we fix the Omitted Variable Bias?
By using multiple linear regression - allows us to estimatr the effect on Y of changing one variable X while holding other regressors constant (or fixed).
We can take a sub-sample fo the data to get a more focused relationship between X and Y - highlights the idea of holding X fixed in estimating a direct relationship.
Coefficient interpretation in multiple linear regression
Beta1 is the expected change in Y for a one-unit change in X holding other regressors/factors/control variables constant/fixed.
Adjusted R squared
Modified R squared that doesn’t necessarily increase when a new regressor is added - it is always less than R squared and thus, always less than one and can be negative
- If the adjusted R squared is very close to one it is often a sign that there is a logical problem with the regression model!
Measures of fit (R-squared) + Standard Error for multiple linear regression
Formula is the last one under goodness of fit on the formula sheet
Additional Least Square Assumption (compared to single linear regression)
(Just identify)
No perfect multicollinearity
Perfect multicollinearity
(Explained)
Occurs if one of the regressors is a perfect linear combination of other regressors - when this occurs it is impossible to hold one regressor fixed to estimate the effect of one of the other collinear regressors on the dependent variable.
Dummy Variable Trap
Possible source of multicollinearity
Occurs when multiple dummy variables are used as regressors meaning, that the group of dummy variables add up to always equal another and thus, becomes a constant regressor.
Look at dummy variable example in notes if struggling to understand
How can you avoid Dummy Variable Trap?
By dropping one of the dummy variables (or dropping the constant) - the second is incredibly rare though
Imperfect Multicollinearity
When one of the regressors are highly but not perfectly correlated with other regressors - when this happens large standard errors arise and are therefore statistically insignificant regression coefficients