L5: IV regression Flashcards
When might we use instrumental variables? (3 and what these issues have in common)
1) OVB from a variable that is correlated with X but is unobserved (tf cannot be incl. in regression eqn.)
2) Simultaneous causality bias (ie. X causes Y AND Y causes X)
3) Errors-in-variables bias (X is measured with error)
All 3 problems -> E(u|X) not equal to zero
What does IV regression do?
Eliminates bias when E(u|X) not equal to zero, using an instrumental variable, Z
What are endogenous and exogenous variables?
Endogenous - a variable correlated with u
Exogenous - a variable not correlated with u
What are the two conditions for a VALID INSTRUMENT?
1) Instrument relevance: corr(Zi, Xi) /=0
2) Instrument exogeneity: corr(Zi, ui) = 0
Explain carefully how to estimate when using an IV?
2 stage least squares:
1) ISOLATE part of X that is uncorrelated with u by regressing X on Z using OLS:
EQN: Xi= π0+ π1Zi+ vi
Because Zi is uncorrelated with ui, π0+ π1Zi is also tf so is Xi! From here, we then compute predicted values of Xi, where: Xi(hat)=π0(hat)+ π1(hat)Zi
2) Replace Xi by Xi(hat) in the regression of interest, and regress Y on Xi(hat) using OLS:
ie. Yi=B0+B1Xi(hat)+ui
Since Xi(hat) is uncorrelated with ui, E(u|X(hat))=0 tf it works! (Then can estimate B1(hat)(TSLS))
What does 2SLS require?
n to be large so π0 and π1 are estimated precisely
Show that the 2SLS estimator is equal to the ratio of the covariances: S(YZ)/S(XZ)
see notes bottom page 1 side 1
Is the 2SLS estimator consistent?
YES see notes for why (ie. both the sample covariances are consistent tf the estimator tends with probability to true value of B1)
What is inference like using TSLS?
Same as usual
Why are OLS standard errors from the 2nd stage regression wrong?
They do not take into account the estimation of the first stage where Xi(hat) is estimated (stata can solve this with a command that computes the TSLS with corrects SEs) (HTSK-robust SEs)
Why would a regression that relates quantity (Y) to price (X) likely suffer from bias? What type of bias would this be?
This regression only gives equilibrium point at the crosssover of S and D, but when collecting data in a market only get price and quantity at equilibrium tf no D and S function and tf this gives rise to simultaneity bias (ie. change in D causes change in Quantity supplied and vice versa?)
See
cigarette demand example in notes
See
General IV regression model notes
What is the problem in the generalised IV regression model with adding more IVs?
see notes
Explain the three cases of identification relevant to 2SLS? When can 2SLS be done?
Exact identification if m=k
Underidentified if m less than k
Overidentified if m>k
Can only be done with exact/overidentification - where m is number of IVs and k is number of ENDOgenous regressors