Lecture 2 Flashcards
MLR difference to SLR
Started adding new explanatory variables to improve accuracy of model and reduce bias in estimating the effect of the primary variable of interest
What do the coefficients of each explanatory variable mean and how do we get them?
They represent the marginal effect of each corresponding variable, get them by finding the partial derivative with respect to corresponding variable.
In order to give a causal interpretation to the coefficients the key assumption is:
The expected value of u given all values of the xis is 0
- i.e. when you control for these, there’s no systematic bias in the error term
SST =
R^2 =
SST = SSE + SSR
R^2 = SSE/SST = 1 - SSR/SST
R^2 value in relation to explaining the variation in the dependent variable
- r^2 can never decrease when adding more regressions, as assign another explanatory variable can only either improve or leave unchanged the amount of variability explained by the model
CAREFUL - don’t overfit, as focusing on maximising R^2 may lead to the inclusion of irrelevant variables just to improve the model’s fit artificially.
MLR.1
The model is linear in parameters
MLR.2
We have a random sample of size n from the population
MLR.3
No perfect collinearity, so in the sample, none of the explanatory variables are constant, and there are no exact linear relationships between them (perfect collinearity)
- without this OLS would not be able to distinguish the independent contribution of each variable.
MLR.4
Zero conditional mean
- if u is correlated with the explanatory variables, it is said to be endogenous, making biased and inconsistent estimates.
Steps to analyse OLS in MLR:
- Obtain convenient representations
- Manipulate expression to obtain Bj^ = Bj + sampling error
- Compute mean and variance of sampling error
Use the partialling out method
Partialling out algorithm for computing OLS estimators
- Regress x1 on x2
- Compute residuals
- Regress the outcome y on the residuals obtained above
The estimator found in step 3 above is numerically identically to the MLR estimator, we are essentially isolating the pure effect of x1 on y, holding x2 constant
Omitted Variable Bias
Why it’s a problem can be shown by trying to find expected value of B1, when u includes another explanatory variable within it
- will make the estimator biased
= B1 + B2.01
Where 01 = cov^(x1,x2)/Var^(x1)
- so if B2 = 0, or the covariance is 0, the SLR will get it right
MLR.5
Homoskedasticity, so the variance of the error, u, doesn’t change with xi
Sample variance of OLS slope estimators under MLR
Var(BJ^|Xn) = o^2/SSTj(1-Rj^2)
- lower error variance means estimator more precise
- higher variation in Xj, means estimator more precise
- lower R^2, means less xj is correlated with each other, easier to isolate its effect on y - leading to a smaller variance for estimator
Unbiased o^2 estimation under the GM assumptions
O^^2 = (1/n-k-1). (Sum of ui^s squared) = SSR/df
Standard error is computed as
Se(Bj^) = o^/(SSTj(1-R^2))^0.5
Why OLS always preferred to the alternative under GM?
It is both unbiased and efficient, and MLR.5 means the variance is lower than the alternative, making it a more reliable estimator.
BLUE - minimising the sum of squared residuals, while utilising the assumption of homoskedasticity to achieve the most precise estimates.