Selection on Unobservables: Instrumental Variables Flashcards
Basic idea of instrumental variable
- Z serves as an instrument which ‘exogenously shifts’ D
- IVs allow to estimate the effect of that part of the variation in D which is due to Z (i.e. exogenous)
“estimation procedure takes variation in the explanatory variable that matches up with variation in the instrument (and so is uncorrelated with the error), and uses only this variation to compute the slope estimate” (Kennedy, p. 141)
Two IV requirements
- Z is (as strongly as possible) correlated with D (strength)
- Z does not affect Y other than through D (exclusion restriction)
Derive the Wald estimator
Y =α+δD+ε
This is a population equation. Taking expectations:
E[Y] = E[α + δD + ε] = α + δE[D] + E[ε].
Suppose now that there is a binary variable Z, and that δ is constant within the population. We can then write this as a difference equation, depending on Z = 0, 1: E[Y|Z = 1]−E[Y|Z = 0] =
δ(E[D|Z =1]−E[D|Z =0])+E[ε|Z =1]−E[ε|Z =0], divide by E[D|Z = 1] − E[D|Z = 0], then …
E[Y|Z=1]−E[Y|Z=0] / E[D|Z=1]−E[D|Z=0] =
δ(E[D|Z=1]−E[D|Z=0]+E[ε|Z=1]−E[ε|Z=0]) / E[D|Z=1]−E[D|Z=0]
If Z is unassociated with ε then E[ε|Z =1]−E[ε|Z =0]=0 so that
E[Y|Z =1]−E[Y|Z =0] /E[D|Z = 1] − E[D|Z = 0]
= δ.
This is exactly what the Wald estimator yields in a sufficiently large sample, namely
δˆWald = E[yi|zi =1]−E[yi|zi =0] / E[di|zi = 1] − E[di|zi = 0]
Roadmap for IV with non-binary intrument
- estimate a first stage Z → D
- obtain fitted values for D, Dˆ (which contain only IV-induced variation) 3. estimate a second stage Dˆ → Y
Formal derivation of Regression
Consider the ‘simultaneous equations’:
d = X′β+γz+ε, (1)
y = X′β+δd+v, (2)
ii2ii
We are interested in δ but cannot estimate it without bias because Cov (yi , vi ) ̸= 0
(because of unobserved omitted variables). Now substitute (1) into (2) and rearrange
y = X′β +δ[X′β +γz +ε]+v,
= X′[β1 +δβ2]+δγz +[δε +v]
- define ui ≡ δεi + vi and rearrange to y = X′β +δ[X′β +γz]+u.
Note a few things: - this is close to a standard form regression model (for the population)
- expression [•] are the fitted population values for D from equation (1)
- these are free of any possible correlation between D and v
→ further note that - X and Z are uncorrelated with ε
- and Z is uncorrelated with v if the exclusion restriction holds
→ so that X and Z are uncorrelated with u
⇒ thus, δ is an unbiased population coefficent.
Estimation for a sample
Even if we only have a random sample from the population, all this is handy for estimation.
Estimates of di can be obtained from estimating (1) with OLS and calculating: di =Xiβ1+γZ.
Estimates of of the errors ε are then (d − dˆ ).
Decomposing ui into δεi + vi again, equation (3) for the sample then becomes (for
simplicity, coefficients are called the same as above)
y = X′β2 +δdˆ +[δ(d −dˆ)+v],
which can be estimated using OLS producing consistently estimate coefficients βˆ2 , δˆ
(because dˆi is uncorrelated with (di − dˆi ) and vi ).
General remarks on IV estimation
- the coefficient on dˆi , here δˆ, is called a 2SLS estimator
- if effect δ conditional on Z is homogeneous in the population, ATE interpretation appropriate
- otherwise, only interpretation as a LATE possible (next lecture)
- the entire estimation procedure is known as two stage least squares (2SLS)
- note: if 2SLS is estimated manually, SEs are too small (for same reason as in sequential estimation of Heckman/Roy models) → bootstrap
- X-vectors in the first and second stage should include the identical variables
OLS versus IV results and the bias
- attenuation bias due to measurement error
- if growth data has random measurement error → attenuation bias
- IV can correct for that
- attenuation bias is always towards 0, so here upward - endogeneity bias
- endogeneity of growth: low growth and conflict enforce each other
- bias therefore likely to be downward
(OLS should produce a more negative coefficient than IV)
→ comparing the OLS vs IV coefficients suggests that bias 1 outweighs bias 2
Bootstrapping and S.E.s
Problems that IVs can remedy
- not only endogeneity (or ‘simultaneity’)
- also attenuation bias resulting from measurement error of independent variables (here growth)
- omitted variable bias / treatment selection on unobservables
- serial error correlation in models with lagged dependent variables in X
- IV estimation conceptually inspired intention to treat estimates in experimental literature
- many identification strategies relying on natural experiments use 2SLS-IV estimation
LATE
… is the effect of the treatment on those whose treatment status is changed by the instrument. Neither does it apply to all treated or untreated, nor to the entire sample (like the ATE does).
If monotonicity holds such that E[di|zi = T] ≥ E[di|zi = C] for all i, the LATE applies to all compliers.
If monotonicity holds such that E[di|zi = T] ≤ E[di|zi = C] for every i, it applies to all defiers.
Important insight I:
Usually, Wald or IV estimates can only be interpreted as LATEs.
Important insight II:
Wald or IV estimates of causal effects are only informative about the ATE if the causal effects are constant across individuals.
Important insight III:
An IV and an experimental treatment with partial compliance are the same from an identification point of view.
Experiments vs IV approach
- exogenous: treatment vs IV
- used variation: take-up/compliance vs. 1st stage fitted values
- estimator: usually Wald vs. Wald for binaty IV & otherwise usually 2SLS-IV
- interpretation: effect for compliers (LATE) vs. effect for all with treatment status changed (LATE), unless homogeneous effect)
three main reasons for 2SLS bias
- small samples
- ‘weak’ first stage
- mis-specification
Problems with small samples
- spuriously one might get
Cov (Zsample , Dsample ) ≠ c + e, even though
Cov(Zpop, Dpop) = c
→ this is simply due to ‘bad luck’ when drawing the (too small) sample
- obviously, e biases IV estimate
- if c is small, the first stage will exhibit weak correlation between Z and D → exacerbates or ‘inflates’ the bias, as will be shown next
Strenght of an IV
- is how much of variation in D is explained by Z Indicators of strength
- sizeable magnitude and statistical significance of the first-stage IV coefficient is not a sufficient indicator of strength
- the reverse is true: a first-stage IV coefficient close to zero is a sufficient indicator of weakness
- F-statistic ≥ 10 (of the test whether all first-stage coefficients are jointly zero) often accepted as indicator of strength
- best indicator is a large difference in R2 between a first-stage regression with and without the instrumental variable included