13: Instrumental Variables Flashcards
basic idea/concept
interested in the causal effect of a particular explanatory variable of a policy interest
BUT you have concerns that OLS is subject to OVB, reverse causality, and/or attenuation bias from measurement error
main idea is that the explanatory variable of interest has both good and bad variation
- instrument helps pick out the exogenous part of the variation in our independent variable of interest
2SLS regressions
first-stage regression
second-stage regression
reduced form regression
first-stage regression
regression of X on Z (explanatory variable on instrument)
- tells you if instrument is statistically significantly related to the explanatory variable of interest
decomposes the variation in X into the good variation that you want to use for estimation if assumptions hold
second-stage regression
regression of Y on predicted variation (outcome of interest on predicted variation i nX)
correctly identifies the causal effect of X on Y if the instrument pushes around X and isn’t related to other things that determine Y
reduced-form regression
regression of Y on Z
second-stage IV point estimate computation
cov(Y,Z) / cov(X,Z) = point estimate of reduced form / point estimate of first-stage
sampling distribution of 2SLS
normal in large samples
- distributed by the CLT
three essential assumptions to hold true for estimating causal effects using IVs
relevance
exogeneity
exclusion restriction
instrument relevance
instrument needs to be related strongly enough to the endogenous explanatory variable of interest
running regression of X on Z and seeing how strong it is
instrument exogeneity
instrument must be uncorrelated with any unobserved variables that also affect Y (in the error term)
cov (Z, error) = 0
instrument exclusion restriction
instrument must not have a direct effect on Y itself except through its relationship with X
conditional on X, Z has no effect on Y
again cov (Z, error) = 0
F-statistic
used to test a joint hypothesis (involving more than one restriction/equation)
exploits the fact that t-statistics of individual coefficients are normally distributed
rule of thumb in 2SLS is that the F-stat is above 10
if you have one IV for an endogenous regressor, F-stat is just the square of the t-stat from the slope coefficient of the first stage regression
over-identification test
assessing exogeneity/exclusion restriction
need one instrument for each endogenous regressor
- with less, equation is under-identified
- with more, equation is over-identified
testing whether two different IVs lead to a very similar 2nd stage IV point estimate or not (should if exogeneity/exclusion restriction holds)
IV estimation and the local average treatment effect
instrument doesn’t allow you to estimate the ATE but the local ATE
- causal effect of X on Y where the units affected by instruments with relationship on Xs are the ones who comply with the IV effect
3 cases when the IV point estimate is the ATE
when the treatment effect of X on Y is constant
- treatment effect has no variation
when the effect of the instrument on X in the first-stage is constant
- no heterogeneity but still recover what you want to recover
when there is no covariance between heterogeneity in the first and second stages