L5: IV regression Flashcards
When might we use instrumental variables? (3 and what these issues have in common)
1) OVB from a variable that is correlated with X but is unobserved (tf cannot be incl. in regression eqn.)
2) Simultaneous causality bias (ie. X causes Y AND Y causes X)
3) Errors-in-variables bias (X is measured with error)
All 3 problems -> E(u|X) not equal to zero
What does IV regression do?
Eliminates bias when E(u|X) not equal to zero, using an instrumental variable, Z
What are endogenous and exogenous variables?
Endogenous - a variable correlated with u
Exogenous - a variable not correlated with u
What are the two conditions for a VALID INSTRUMENT?
1) Instrument relevance: corr(Zi, Xi) /=0
2) Instrument exogeneity: corr(Zi, ui) = 0
Explain carefully how to estimate when using an IV?
2 stage least squares:
1) ISOLATE part of X that is uncorrelated with u by regressing X on Z using OLS:
EQN: Xi= π0+ π1Zi+ vi
Because Zi is uncorrelated with ui, π0+ π1Zi is also tf so is Xi! From here, we then compute predicted values of Xi, where: Xi(hat)=π0(hat)+ π1(hat)Zi
2) Replace Xi by Xi(hat) in the regression of interest, and regress Y on Xi(hat) using OLS:
ie. Yi=B0+B1Xi(hat)+ui
Since Xi(hat) is uncorrelated with ui, E(u|X(hat))=0 tf it works! (Then can estimate B1(hat)(TSLS))
What does 2SLS require?
n to be large so π0 and π1 are estimated precisely
Show that the 2SLS estimator is equal to the ratio of the covariances: S(YZ)/S(XZ)
see notes bottom page 1 side 1
Is the 2SLS estimator consistent?
YES see notes for why (ie. both the sample covariances are consistent tf the estimator tends with probability to true value of B1)
What is inference like using TSLS?
Same as usual
Why are OLS standard errors from the 2nd stage regression wrong?
They do not take into account the estimation of the first stage where Xi(hat) is estimated (stata can solve this with a command that computes the TSLS with corrects SEs) (HTSK-robust SEs)
Why would a regression that relates quantity (Y) to price (X) likely suffer from bias? What type of bias would this be?
This regression only gives equilibrium point at the crosssover of S and D, but when collecting data in a market only get price and quantity at equilibrium tf no D and S function and tf this gives rise to simultaneity bias (ie. change in D causes change in Quantity supplied and vice versa?)
See
cigarette demand example in notes
See
General IV regression model notes
What is the problem in the generalised IV regression model with adding more IVs?
see notes
Explain the three cases of identification relevant to 2SLS? When can 2SLS be done?
Exact identification if m=k
Underidentified if m less than k
Overidentified if m>k
Can only be done with exact/overidentification - where m is number of IVs and k is number of ENDOgenous regressors
See notes
Bottom of side 2 check I understand how to do TSLS with a single endogenous regressor (X) and multiple exogenous regressors (W1…Wi) (go over cig example too!)
If you have 2 suitable IVs, Z1 and Z1, that are both correlated with the endogenous variable and uncorrelated withe error, which should you use and why?
BOTH!
regress the endogenous variable on both Z1 and Z2 - this is a case of overidentification and therefore will reduce the SE of the results (so long as additional IVs are appropriate): more information -> BETTER ESTIMATES!
Explain under what assumptions does TSLS hold and its t-statistic is normally distributed?
- E(ui|W1i,…,Wri) = 0 the exogenous regressors are exogenous.
- (Yi,X1i,…,Xki,W1i,…,Wri,Z1i,…,Zmi) are i.i.d
- The X’s, W’s, Z’s, and Yhave nonzero, finite 4th moments
- The instruments (Z1i,…,Zmi) are valid (ie. Corr(Zmi,ui)=0 and Corr(Zmi,Xi)=/0 for m=1 to M)
In MRM generalised IVs, when are instruments said to be relevant? And when are they said to be weak?
In the first stage, if at least one π is not equal to zero then the instruments are relevant
If they are all equal to zero (or v. close to zero) the instruments are weak
What do weak instruments do?
They explain very little of the variation in X BEYOND what is explained by the W’s
What is a consequence of IVs being weak?
TSLS sampling distribution and t-stat are not at all normal, even when n is large!
(Why? Because makes S(XZ) v small tf beta1(hat)TSLS becomes very large!) (ie. no correlation between X and Z and tf Z does not explain X tf Z does not explain Y either!) (see notes bottom of S2P2 and top of S1P3)
How do you test instrument strength?
F-test that tests that all the coefficients on Z1,…,Zm DO NOT ENTER first stage regression (ie. are all equal to zero)
Rule of thumb: if F-stat is less than 10 then the set of instruments is weak! (tf -> biased 2SLS)
What does comparing to F=10 actually allow us to do?
Compare if the bias (relative to OLS) is greater or less than 10% (IF F is less than 10, bias is more than 10% and vice versa!!!)
2 solutions to weak instruments?
1) Find better instruments/drop ones you think may be weak
2) Use other estimators (can be very complicated though)
What criteria must be fulfilled to test for instrument exogeneity? What is the consequence for TSLS if this assumption does not hold?
Criteria: the model must be overidentified to do this test!
If the assumption of instrument exogeneity fails, then TSLS is INCONSISTENT!
When to use J-test of overidentifying restrictions?
If given say 2 IVs, Z1 and Z2, and computer TSLS for both and the estimates for beta are very different, then know that one of Z1 or Z2 must be invalid
See
bottom of p2s2 on how to conduct a J-test
What are the hypotheses for a J-test?
H0: All instruments are exogenous
H1: At least one instrument is not exogenous
J-statistic distribution? How many DofF in a J-test?
Chi-squared, with m-k DofF
Why must the model be overidentified to do a J-test?
Because otherwise the DofF, m-k, will equal 0!
What does it mean if the actual J statistic is in the critical region?
Means that H0 is rejected because there is at least one endogenous IV
Summary?
Slides 38 and 39 if needed
See
S3P3 in notes on cig demand bit
How can we interpret the J-test rejection?
Can use intuition to try work out which variable(s) is/are endogenous, then redo the model and try again
What points need to be considered when assessing the validity of a study?
1) OVB
2) Function form misspecification
3) Simultaneous causality bias
4) Errors-in-variables bias
5) Selection bias (ie. have all states been used or just some???)
6) Are IVs truly relevant and exogenous
7) Old data: if using old data need to consider if it is externally valid to apply it to today’s problems