Instrumental variables Flashcards
What is the CIA?
Conditional on observed characteristics x_i,the selection bias disappears. That is, conditional on x_i, treatment status, D_i is independent of potential outcomes
conditional on covariates x_i, treatment status, D_i is as good as randomly assigned.
If Conditional on a vector of observable covariates, x_i, D_i is statistically independent of η_i implying that E[η_i│D_i,x_i ]=E[η_i│x_i ], what assumption is satisfied?
CIA
The conditional expectation of η_i does not depend on D_i if we control for x_i. Conditional on x_i, D_i is as good as randomly assigned, so D_i becomes uncorrelated with η_i. The key assumption here is that the observable characteristics x_i are the only reason why η_i and D_i are correlated.
our instrument, z_i, should satisfy the following two criteria
Cov(z_i,η_i )=0
Cov(z_i,x_i )≠0
B1=cov(?,?)/cov(?,?)
(Cov(y,z))/(Cov(x,z))=((Cov(y,z))⁄(Var(z)))/((Cov(x,z))⁄(Var(z)))
two variables in first stage regression (denominator)
reg x on z
two variables in reduced form regression (numerator)
reg y on z
True or false: i’s ok if first stage is 0
false–
assumption for first stage?
the instrument must have a clear effect on x_i.
exclusion restriction definition
the statement that the instrument is as good as randomly assigned (i.e., independent of potential outcomes, conditional on covariates), while the second is that the instrument has no effect on outcomes other than through the first-stage channel
reduced form model?
y_i=γ_0+γ_1 z_i+ζ_i
first stage model?
x_i=δ_0+δ_1 z_i+μ_i
IV model?
y_i=β_0+β_1 x_i+η_i
true or false: all instruments (including controls) should be included in all stages
true
condition-instrument relevance
Cov(z_i,x_i)≠0
condition-instrument independence
Cov(z_i,η_i )=0
what happens to standard errors and causal estimates if instrument is weak?
.With weak instruments, instrumental variable estimates are badly biased–look just like ols
Furthermore, weak instruments imply that your first stage predicted value will be quite noisy. That fact implies that the second stage instrumental variable estimates are likely to have very large standard errors.
test for weak instruments?
F-test–is it larger than 10?
if not, weak
assumptions necessary for instrument validiity?
1) Random assignment of z (in first‐stage)
2) Exclusion restriction (z does not belong in the second stage)
3) Nonzero causal effect of z on x
what estimates do IV provide?
LATE–ATE for those induced by instrument
who are non-compliers?
always-takers and never-takers
when will the the parameter identified by instrumental variables differ from the average effect of interest.
heterogeneous treatment effects
what do compliers do?
what they are told
x=1 if z=1 and x=0 if z=0
what do never-takers do?
never want treatment
x=0 if z=1 and x=0 if z=0
what do always-takers do?
always want treatment
x=1 if z=1 and x=1 if z=0
what do defiers do?
the opposite
x=0 if z=1 and x=1 if z=0
what is monotonicity assumption?
there are no defiers
can we identify who compliers are?
no
why can’t we estimate ATE?
we can’t estimate the effect among non-compliers because the instrument doesn’t affect their treatment status, there is no exogenous variation in their treatment status that we can use.
Is LATE informative about effects on always-takers and never-takers?
no–by definition, treatment status for these two groups is unchanged by the instrument (random assignment).
Is LATE usually the same as TOT?
No–The effect of treatment on the treated (TOT) would be a weighted average of the effect on compliers and always takers, the latter effect we never observe.
what is an instrumental variable?
An instrumental variable is a variable that can be used to isolate exogenous variation in a explanatory variable that would otherwise be endogenous.
How can we test the assumption that Cov(Zi,ui) = 0
We can’t!
How can we test the assumption that Cov(Zi,Xi) NOT= 0
regress x on z and see if coefficient it statistically significant–is z a good predictor of x?
IV estimator expressed in terms of covariance of z, y and x?
β1,IV = Cov(Zi,Yi)/Cov(Zi,Xi)
reduced form regression?
Yi =α+ρZi +wi
cov (zi, yi)/var(zi)
first stage regression?
Di =γ+φZi +vi
cov(zi, di)/var(zi
relationship between reduced form, first stage, and IV estimator?
estimator=reduced form/first stage
IV tells us the ATE for which group?
those induced into treatment by the instrument (the lottery offer): the compliers
which group drive the first stage regression?
compliers
True or false: In IV, we don’t learn anything about always-takers or never-takers
True
what assumption do we use to justify ignoring defiers?
monotonicity
When would the LATE and TOT be the same?
When there are no always-takers in the treated pop
What identifying assumptions are needed for consistency of IV estimator?
Cov(Zi,Xi) NOT= 0
Cov(Zi,ui) = 0
Which is more efficient: OLS or IV estimator?
OLS
Why is the variance of the IV estimator greater than of the OLS estimator?
We are using only a portion of the variance
If the reduced form effect is statistically insignificant, can you accurately estiamte the LATE?
no
What happens to standard errors if first stage is weak?
larger standard errors
how can you counteract a weak first stage?
larger sample size
If there is a violation of the exclusion restriction and there is a correlation bewteen zi and ui, can a large sample address this?
no
what are the steps for implementing 2SLS using one instrument?
- Get predicted values from first stage regression using reg of x on zi
- Use predicted values of x to predict outcome
True or false: you only need to include covariates in the second stage (not the first)
false–need to include in both
True or false: if you have multiple instruments, you need to include them in both stages
false-include only in first stage
what is the wald estimator?
the name given to estimation with one endogenous variable and one excluded instrument, in which the reduced form slope coefficient is divided by the first stage slope coefficient.
definition of just identified IV model?
just identified if the number of excluded instruments in the vector Zi equals the number of endogenous explanatory variables.
definition of over identified IV model?
over-identified if the number of excluded instruments in the vector Zi is more than the number of endogenous explanatory variables.
definition of under identified IV model?
under-identified if the number of excluded instruments in the vector Zi is less than the number of endogenous explanatory variables.
If the model is under identified (number of excluded instruments in the vector Zi is less than the number of endogenous explanatory variables.), then is there a consistent IV estimator?
No–you need at least as many excluded instruments as endogenous explanatory variables.
how to test for validity of instruments, E (ui |Zi ) = 0, if overidentified?
over-identification test
what is the null hypothesis of an over-identification test?
H0 : all instruments are valid
Rejection can be interpreted as one or more instruments are not valid (or, model was mis-specified to begin with). Note a lack of rejection should not be interpreted as confirmation of validity.
We would prefer to use OLS–how do we test whether one of the explanatory variables actually is endogenous?
Durbin-Wu-Hausman test statistic
H0 : explanatory variable(s) is exogenous
What does rejection of Wu-Hausman test imply?
Rejection suggests explanatory variable(s) is endogenous and IV is appropriate.
how to test for weak insrumentt?
Look at first stage–When there is one endogenous explanatory variable, this statistic is the F statistic for significance of the instruments from the first stage. F > 10 is the usual rule-of-thumb for rejecting a weak instrument.
What is the null for the estat firststage command in stata?
null is that instruments are weak
you can reject the null that instruments are weak by looking at the f stat–is it more than 10?
what is the null for the estat overid command in stata?
null is that instruments are valid
if you reject the null, this means that your instrument is endogenous