Selection on Unobservables: Instrumental Variables Flashcards

1
Q

Basic idea of instrumental variable

A
  • Z serves as an instrument which ‘exogenously shifts’ D
  • IVs allow to estimate the effect of that part of the variation in D which is due to Z (i.e. exogenous)

“estimation procedure takes variation in the explanatory variable that matches up with variation in the instrument (and so is uncorrelated with the error), and uses only this variation to compute the slope estimate” (Kennedy, p. 141)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two IV requirements

A
  • Z is (as strongly as possible) correlated with D (strength)
  • Z does not affect Y other than through D (exclusion restriction)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Derive the Wald estimator

A

Y =α+δD+ε
This is a population equation. Taking expectations:
E[Y] = E[α + δD + ε] = α + δE[D] + E[ε].
Suppose now that there is a binary variable Z, and that δ is constant within the population. We can then write this as a difference equation, depending on Z = 0, 1: E[Y|Z = 1]−E[Y|Z = 0] =
δ(E[D|Z =1]−E[D|Z =0])+E[ε|Z =1]−E[ε|Z =0], divide by E[D|Z = 1] − E[D|Z = 0], then …

E[Y|Z=1]−E[Y|Z=0] / E[D|Z=1]−E[D|Z=0] =
δ(E[D|Z=1]−E[D|Z=0]+E[ε|Z=1]−E[ε|Z=0]) / E[D|Z=1]−E[D|Z=0]
If Z is unassociated with ε then E[ε|Z =1]−E[ε|Z =0]=0 so that
E[Y|Z =1]−E[Y|Z =0] /E[D|Z = 1] − E[D|Z = 0]
= δ.
This is exactly what the Wald estimator yields in a sufficiently large sample, namely
δˆWald = E[yi|zi =1]−E[yi|zi =0] / E[di|zi = 1] − E[di|zi = 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Roadmap for IV with non-binary intrument

A
  1. estimate a first stage Z → D
  2. obtain fitted values for D, Dˆ (which contain only IV-induced variation) 3. estimate a second stage Dˆ → Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Formal derivation of Regression

A

Consider the ‘simultaneous equations’:
d = X′β+γz+ε, (1)
y = X′β+δd+v, (2)
ii2ii
We are interested in δ but cannot estimate it without bias because Cov (yi , vi ) ̸= 0
(because of unobserved omitted variables). Now substitute (1) into (2) and rearrange
y = X′β +δ[X′β +γz +ε]+v,
= X′[β1 +δβ2]+δγz +[δε +v]

  • define ui ≡ δεi + vi and rearrange to y = X′β +δ[X′β +γz]+u.
    Note a few things:
  • this is close to a standard form regression model (for the population)
  • expression [•] are the fitted population values for D from equation (1)
  • these are free of any possible correlation between D and v
    → further note that
  • X and Z are uncorrelated with ε
  • and Z is uncorrelated with v if the exclusion restriction holds
    → so that X and Z are uncorrelated with u

⇒ thus, δ is an unbiased population coefficent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Estimation for a sample

A

Even if we only have a random sample from the population, all this is handy for estimation.
Estimates of di can be obtained from estimating (1) with OLS and calculating: di =Xiβ1+γZ.
Estimates of of the errors ε are then (d − dˆ ).
Decomposing ui into δεi + vi again, equation (3) for the sample then becomes (for
simplicity, coefficients are called the same as above)
y = X′β2 +δdˆ +[δ(d −dˆ)+v],

which can be estimated using OLS producing consistently estimate coefficients βˆ2 , δˆ
(because dˆi is uncorrelated with (di − dˆi ) and vi ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

General remarks on IV estimation

A
  • the coefficient on dˆi , here δˆ, is called a 2SLS estimator
  • if effect δ conditional on Z is homogeneous in the population, ATE interpretation appropriate
  • otherwise, only interpretation as a LATE possible (next lecture)
  • the entire estimation procedure is known as two stage least squares (2SLS)
  • note: if 2SLS is estimated manually, SEs are too small (for same reason as in sequential estimation of Heckman/Roy models) → bootstrap
  • X-vectors in the first and second stage should include the identical variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

OLS versus IV results and the bias

A
  1. attenuation bias due to measurement error
    - if growth data has random measurement error → attenuation bias
    - IV can correct for that
    - attenuation bias is always towards 0, so here upward
  2. endogeneity bias
    - endogeneity of growth: low growth and conflict enforce each other
    - bias therefore likely to be downward
    (OLS should produce a more negative coefficient than IV)
    → comparing the OLS vs IV coefficients suggests that bias 1 outweighs bias 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bootstrapping and S.E.s

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Problems that IVs can remedy

A
  • not only endogeneity (or ‘simultaneity’)
  • also attenuation bias resulting from measurement error of independent variables (here growth)
  • omitted variable bias / treatment selection on unobservables
  • serial error correlation in models with lagged dependent variables in X
  • IV estimation conceptually inspired intention to treat estimates in experimental literature
  • many identification strategies relying on natural experiments use 2SLS-IV estimation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

LATE

A

… is the effect of the treatment on those whose treatment status is changed by the instrument. Neither does it apply to all treated or untreated, nor to the entire sample (like the ATE does).
If monotonicity holds such that E[di|zi = T] ≥ E[di|zi = C] for all i, the LATE applies to all compliers.
If monotonicity holds such that E[di|zi = T] ≤ E[di|zi = C] for every i, it applies to all defiers.

Important insight I:
Usually, Wald or IV estimates can only be interpreted as LATEs.
Important insight II:
Wald or IV estimates of causal effects are only informative about the ATE if the causal effects are constant across individuals.
Important insight III:
An IV and an experimental treatment with partial compliance are the same from an identification point of view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Experiments vs IV approach

A
  • exogenous: treatment vs IV
  • used variation: take-up/compliance vs. 1st stage fitted values
  • estimator: usually Wald vs. Wald for binaty IV & otherwise usually 2SLS-IV
  • interpretation: effect for compliers (LATE) vs. effect for all with treatment status changed (LATE), unless homogeneous effect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

three main reasons for 2SLS bias

A
  1. small samples
  2. ‘weak’ first stage
  3. mis-specification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Problems with small samples

A
  • spuriously one might get

Cov (Zsample , Dsample ) ≠ c + e, even though

Cov(Zpop, Dpop) = c

→ this is simply due to ‘bad luck’ when drawing the (too small) sample

  • obviously, e biases IV estimate
  • if c is small, the first stage will exhibit weak correlation between Z and D → exacerbates or ‘inflates’ the bias, as will be shown next
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Strenght of an IV

A
  • is how much of variation in D is explained by Z Indicators of strength
  • sizeable magnitude and statistical significance of the first-stage IV coefficient is not a sufficient indicator of strength
  • the reverse is true: a first-stage IV coefficient close to zero is a sufficient indicator of weakness
  • F-statistic ≥ 10 (of the test whether all first-stage coefficients are jointly zero) often accepted as indicator of strength
  • best indicator is a large difference in R2 between a first-stage regression with and without the instrumental variable included
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Weakness and Bias

A

E[Y|Z = 1]−E[Y|Z = 0]/ E[D|Z = 1]−E[D|Z = 0]=
δ[E[D|Z = 1]−E[D|Z = 0]]+E[ε|Z = 1]−E[ε|Z = 0] =.
δE[D|Z = 1]−E[D|Z = 0] +E[ε|Z = 1] − E[ε|Z = 0] / E[D|Z = 1] − E[D|Z = 0]

Wald-estimator yields δ plus bias if E[ε|Z = 1] − E[ε|Z = 0] ̸= 0.

Bias stronger if denominator → 0
⇒ weak instruments exacerbate any small bias in IV estimates
→ bias may arise from small sample or imperfectly satisfied exclusion restriction
→ intuitive reason for IVs high sensitivity to bias is that they ‘throw away’ a lot of information (only use the variation in D systematically related to Z )

17
Q

Specification and bias warning

A
  • IVs are not a panacea and the assumptions not always easy to justify
  • the case of a randomly assigned Z, like the lottery number earlier, is rare
  • weaker assumption is random assignment of Z conditional on some controls X
  • can be implemented with the 2SLS-IV procedure
  • note: the most common type of identification with IVs is with a 2SLS-IV estimator and controls
  • in that case, specification of first stage crucial because produces the fitted values used in second stage
  • specifically, has to condition on all X such that Cov (Z , ε|X ) = 0
18
Q

2SLS-IV estimator biased if

A
  1. first stage is mis-specified
    (e. g. important controls omitted, non-linearities or interactions ignored)
  2. a special problem arises if X1 (first stage) and X2 (second stage) differ
19
Q

Multiple endogenous variables

A

If multiple variables need instrumenting (say m)
→ at least m different instruments are need (otherwise model not ‘identified’)
Multiple instruments per endogenous variable
Can use more than one instrument per endogenous variable (‘overidentification’). But complicates complicate interpretation: