iv Flashcards
How can we express the coef of interest in 2SLS using population covariances in a simple linear regression?
cov(z, y) = beta*cov(z,x) + cov(z,u). Where cov(z,u) = 0 by assumption. So beta = cov(z,y) / cov(z,x), which is why cov(z,x) can’t = 0.
Why is the estimated SE of an IV always larger than the estiamted SE of OLS?
the variance of the IV must be less than the EV since the SST_EV = SSE_EV + SSR_EV, but SSE_EV is the variation of xhat. The variation bn IV and other exogenous variables is often higher than the variation bn EV and other exog variables
what is the plim of beta_OLS? Beta_IV? what does this tell us about the (in)consistency of OLS vs IV?
plim beta_OLS = beta + cov(x,u) / var(x), plim beta_IV = beta + cov(z,u)/ cov(z,x). Weakly related z and x cause greater inconsistency. IV can be worse than OLS.
Why use two IVs for one EV instead of using only one of the IVs?
Any linear combination of the IVs are uncorrelated with 2nd stage errors, both are correlated and the EV, and the linear combination (via predicted value of EV) is more efficient than using a single. HOWEVER, this may also introduce further finite sample bias.
What is the key identification assumption of an overidentified IV? How do we know if this assumption is met?
The collection of excluded instruments cannot all have coefficients of zero in the 1st stage, we we check this using a test of the joint significance of the excluded instruments in the 1st stage.
What is the “structural equation” in a 2SLS model with the decomposed error?
DV = beta_0 + beta_1xhat + beta_2x2 + u + beta_1*e, where e are residuals from 1st stage.
Does estimating 2SLS by hand, using 2 equations, produce unbiased point estimates and SE’s?
Unbiased point estimate, yes. Unbiased SE’s, no, because the error term in the structural equation will not include the point estimate on the exogenous variable*(1st stage residuals), understating the error variance in the structural equation.
What is the formula for the SE of an IV?
assuming homoskedasticity, sigma-sq // SST_xhat *(1-R-sq_(xhat on all exogenous, save the excluded variable))
How can we test whether a variable is endogenous and whether we should use IV?
Hausman test comparing coefficients from OLS to 2SLS.
If I was not going to use the “estat endog” command to test if a variable was endogenous, how would I do it by hand? What is the logic of this test?
regress the possible EV on the exogenous variables (including the excluded exog vas) and estimate residuals. Regress the original DV on the possible EV, other exog variables, and the residuals from the first stage. If the coef on the residuals is 0 then the variable in question is not an EV. in the modified structural equation the possible EV will represent the exogenous portion of the possible EV, the 1st stage residuals are the endogenous portion of the possible EV.
If IVs “pass” the overidentification test (fail to reject) does this mean they are definitely exogenous? If they fail (reject the null) does this mean they are definitely endogenous?
No. All IV’s may be endogenous but not be statistically different from one another, which would lead to a false positive. Alternatively, each endogenous excluded instrument may explain similar portions of exogenous and endogenous variation.
IV’s may be exogenous, but subjects may react differently to different instruments (heterogeneity). If the structural model does not take these heterogenous effects into consideration then something in the structural eqn errors may tell us something about the excluded IVs.
What is the logic of both overidentification tests?
One involves is a Hausman test involving the comparison of beta_i, where i refers to the ith excluded instrument, beta_i should not be statistically different from beta_j aside from sampling error.
In the 2nd test, the residuals from the 2nd stage shouldn’t be correlated with the collection of all exog variables (excluded and included) without a constant; a modified R-sq is calculated and compared to chi distribution. The collection of exog variables should not be statistically significant…we don’t want to reject the null.
Is an instrumented variable (xhat) guaranteed to be uncorrelated with the structural model error?
No. The gammas from the 1st stage likely pick up some endogenous variation in x due to a finite sample size. These gammas are essentially in the 2nd stage error since e = x - xhat. however, 2sls is a consistent estimator because as sample size increases the gammas won’t pick up the finite sample bias.
What does an R-sq from the 2nd stages of an 2sls mean?
Nothing, because the SSE uses variation from the xhat, not x.