Lecture 7 Flashcards
Why may MLR.6 not be needed?
If sample size grows large due to CLT
- under MLR.1-5, OLS is BLUE and it is asymptomatically efficient in a broad class of estimators
- if we add MLR.6, tests and and CIs are exact given any sample size
But what if MLR.5 does not hold?
So Var(u|x) is not constant at o^2
- therefore heteroskedasticity is present
- OLS is still unbiased and consistent under MLR.1-4, heteroskedasticity does not cause bias or inconsistency in Bj^
What changes if errors are heteroskedastic?
OLS is no longer the BLUE
- also the usual SEs are no longer valid, an inconsistent estimator of the SE of the estimator as o is no longer constant
What happens to the t statistics now MLR.5 fails?
- Continue to use OLS as estimations are still unbiased and consistent under heteroskedasticity
- but, to make the inferences valid in the presence of heteroskedasticity, use h-robust SEs to compute corrected t statistics
How to find new versions of t, F and CIs given MLR.5 is not holding
- we will derive expression of std(Bj^), which is valid under MLR.1-MLR.4 alone
- use this to develop SEhr (Bj^), which is consistent whether or not MLR.5 holds
- use these to compute the new values of above
So why did we bother with usual standard errors at all?
The H-robust t statistics and CIs only have asymptotic justification, even if the
Downsides of heteroskedasticity-robust stats
- with smaller sample sizes, the heteroskedasticity-robust stats may not work well
- in some cases, they can have more bias than the usual statistics, e.g. if MLR.5 holds
- when MLR.5 is strongly violated, robust statistics typically outperform the usual statistics
SEhr(B1^) =
(1/rootn)((Oux^2)/(o^4x))
Root that to get the standard error for heteroskedastic robust.
Can derive from finding the conditional variance of OLS with heteroskedasticity, and then manipulating
HR t statistic:
(B1^ - B1)/(SEhr(B1^))
- works with or without MLR.5 as described earlier.
Large sample distribution of robust t statistic
Manipulate the standard t statistic, using vi^ = (xi - x_)(ui^)
Thr = (V_)/(o^v/rootn)
- by CLR, together with the fact that o^2v^ tends to the real value as n tends to infinity we obtain:
Thr - N(0,1)
Heteroskedasticity-robust inference
- the usual critical values can be used, the only difference is that we are using a different estimator of std(B1^)
- confidence intervals can be calculated as normal
- note that robust inferences is only valid in large samples, relying on the CLT - asymptotic approximations
Testing for heteroskedasticity:
- null is all coefficients of explanatory variables become 0, after taking E[u^2|x]
- real error term is not observed, so replace with estimator
- now v was the error term of the auxiliary regression, the part of u squared which could not be explained by explanatory variables
- the new v, included after the OLS estimation includes both the original v and the approximation error from using u estimator rather than u
Testing for heteroskedasticity F statistic:
F = (R^2/k).((1-R^2)/(n-k-1))
Where R^2 is from the regression of u^2^ on x1,xk
Why is E[u^2|x] = o^2 + k1x1 +… kkxk even being tested in this breusch-pagan test? What does it represent
- o^2 is the constant variance under homoskedasticity
- the presence of the other terms suggest that the variance may be changing systematically with x - suggesting HR
- v is a random error term, following SLR.4, captures the unexplained fluctuations in the squared errors that are not predicted by x1,xk
Why use the GLS test - Generalised Least Squares?
We now know how to test for HR
- if we reject than in principle we can do better than the OLS - as its no longer the BLUE
- carry out the GLS estimation
Run GLS with:
Yi = B0 + B1xi + ui, o^2(xi) = Var(ui|xi)
- suppose the variance is known to us (why is that necessary)
- need to know variance so we know what we are dividing through by
- first, divide through by o = o(xi), to remove heteroskedasticity
-> becomes yi* = xio* +… ui/oi, where Yi* = yi/oi, etc - but now, var(ui*|xi) = var(ui/oi | xi) = (1/o^2)var(ui|xi) = o^2/o^2 = 1
Now as you can see, heteroskedasticity removed
What if the variance is not known to us in the GLS?
Use the Feasible GLS estimator - try to estimate it
1. Estimate via standard OLS, get ui^ = yi - b0^ - b1^x1
2. Estimate the variance function, so if we suspect the variance of the error term to depend linearly, u could use o^2 = k0 + k1x, and then regress ui^2 on this plus vi, test for coefficients being significant
3. Compute the estimated standard deviation and now transform the model