Instrumental Variables / 2SLS Flashcards
Why might a regression (notably one where we will perform IV) be biased
Confounders
Self selection (i.e. not randomly assigned)
Measurement error (attenuation bias)
What is the general intuition of using an instrumental variable?
The general intuition is that the instrument is a “randomizer”
It creates variation in the regressor of interest that is (hopefully) uncorrelated with all sources of bias.
Essentially, the instrument will only create ‘good’ variation in the regressor of interest that we can use to estimate the causal effect of interest. This variation will hopefully be uncorrelated with confounders and measurement error.
What are the two assumptions of IV? (Just the names, no explanation yet but break the second one down into the 2 parts but also include the mathematical definitions)
Relevance i.e. Cov(Xi,Zi) != 0 (does not equal)
Exogeneity i.e. Cov(Zi, Ui) = 0
Exogeneity is broken down into ‘exclusion’ and ‘as good as randomly assigned’
What is relevance and how can it be tested
The instrument is correlated with the regressor of interest.
This can be tested by regressing the regressor of interest with the instrument and performing a hypothesis test with the null hypothesis Ho: B1 = 0, where rejecting the null means relevance holds
What is exogeneity (describe the two components) and can it be tested?
Exclusion: The instrument only affects the outcome through affecting the regressor of interest and does not appear in the equation directly
As good as randomly assigned: The instrument is uncorrelated with all unobserved factors (confounders/residual) that affect the outcome
Exogeneity cannot be tested because Ui is unobserved
How do you form the three equations needed for 2SLS?
Second Stage: The original regression without the instrument that suffers bias
Reduced Form: original outcome = a + b1(instrument) + ui
First Stage: regressor of interest = d0 + d1(instrument) + ni
Reduced form can also be derived by plugging first stage into second stage
How do you find the coefficient of interest in 2SLS?
Easier to see if you derive reduced form equation by plugging in first stage into second stage
coefficient of interest (B1) = reduced form coefficient/first stage coefficient
How to interpret the Wald Estimator (numerator, denominator and actual result)
Numerator: Average change in outcome when an individual is offered treatment
Denominator: Average change in treatment status when an individual is offered the treatment
Actual Result: For the proportion of people who chose to be treated, the outcome increased by B1 (the result)
What is the purpose of the Wald Estimator and how does it overcome the problem it solves
Wald estimator overcomes the problem of imperfect compliance by incorporating the null effect for individuals who are unaffected by the treatment (due to noncompliance)
What does it mean for a regressor to be overidentified
more instruments than the regressor
What does it mean for a regressor to be just identified
one instrument for one regressor
What does it mean for a regressor to be not identified
0 instruments for the regressor