Endogeneity & Instrumental Variables Flashcards
In general, when can Xi be endogenous (upon what assumptions)
B) what does this mean for our estimate of β₁ (β^₁)?
Usually zero conditional mean assumption exists
E(εi|Xi) = 0
This is violated if Xi and εi covary! i.e
Cov (Xi,εi) ≠ 0
(E.g high X associated with high ε)
B) Then Xi is endogenous!!!!
So E(β^₁) ≠ β₁
So biased!
Thus omitted variables are a source of endogenity
E.g we estimate
Yi =β0 +β1 Xi +εi
But omit relevant variable Zi
When would endogenity occur
When Cov(Xi,Zi) ≠ 0 and β₂ ≠ 0 ,
(Usual bias for omitted variables)
Cov (Xi,εi) ≠ 0 , so endogenous!!!
Measurement errors are also a source of endogenity:
Explain (Assume independent variable with error!)
Do normal steps of estimating β₁ , we get
Yi = β₀ + β₁X*i + εi − β₁ui
Where εi − β₁ui = μi (measurement error and normal error)
Since Cov(Xi,ui) ≠ 0 , Cov (Xi,μi) ≠ 0
So endogeneity
3rd source of endogeity - autocorrelation
Consider a time series model with a RHS lagged dependent variable: Yt =β₀ +β₁Yt−1 +εt
With error term
εt = ρεt−1 + νt
How can we establish endogeneity
It must be the case
Cov(Yt-1,εt-1) ≠ 0 (variable in past is linked to past error)
Cov(εt,εt-1) ≠0 (error is linked to past error)
Therefore
Cov(Yt-1, εt) ≠ 0
(Independent variable is correlated with error!)
So endogeneity!
4th source of endogeneity - Simultaneity (reverse causality)
What is meant by this and example
Y is a function of X , and X is a function of Y
E.g demand (price depends on quantity, and quantity depends on price)
So 4 sources of endogeneity
Omitted variable
Measurement error
Autocorrelation
Simultaneity
Main reason endogeneity is bad
OLS estmimates are bias
How to solve endogeneity
Break correlation between Xi and ε - isolate the part of the variation within Xi that is uncorrelated with the error!
using an instrumental variable!
2 assumptions we make for instrumental variables
Instrument relevance (instrument is highly correlated with X, and relevant for variation in X)
Instrument exogeneity - instrument is uncorrelated with εi
Simple model Yi =β₀ +β₁ Xi +εi
But Cov(Xi, εi)≠0 endogeneous, so a bias β₁ estimate
Now add instrumental variable Zi
How can we use instrument to identify β₁?
Take covariances with respect to Z on both sides
Yi =β₀ +β₁ Xi +εi Turns into
Cov(Zi,Yi) = Cov(Zi,β₀) + Cov(Zi,β₁Xi) + Cov(Zi,εi)
Then since instrumental variable is uncorrelated with error and β₀:
Cov(Zi,Yi) = β₁Cov(Zi,Xi) (and take β₁ out bracket)
Finally solve for β₁
β₁ = Cov(Zi,Yi)/Cov(Zi,Xi)
What does this suggest
A sample analogue
B^₁IV : instrumental variable estimator of β₁
βˆ₁IV = Σ(Zi −Z ̄)(Yi −Y ̄)/Σ(Zi −Z ̄)(Xi −X ̄)
β^₁IV : properties
Consistent, but may be bias in small samples (unbias with large samples
Standard errors of IV and OLS
SE of IV is greater than OLS i.e less efficent (however still good since turned bias into unbias!)
OLS: √σ^²/TSSx
IV: √σ^²/TSSx x R²zx
R²zx is < 1!
So instruments are useful in turning endogeneous/bias estimates into non bias i.e β^₁IV
How can we optain Biv in practice? (2)
Sample analogue - as shown (for 1 instrument and one endogenous variable)
Two stage least squares (2SLS) for multiple variables or instruments
4 equations we use
Basic model Yi =β₀ +β₁ Xi +εi
First stage Xi = γ₀ + γ₁Zi + vi (since first thing we estimate)
Reduced form Yi = λ₀ + λ₁Zi + ui
Second stage Yi = β₀ + β₁X^i + vi (fitted value from first stage)
Steps to 2SLS - first stage
Estimate “first stage” equation by OLS and get fitted values
X^i = γ^₀ + γ^₁Zi
Hypothesis test
H₀: γ₁ = 0 (Z is relevant)
H₀: γ₁ ≠ 0 (Z relevant!)
Note: critical region is t>3.16 , higher than normal 1.96 (need strong instruments that can reject and be relevant!)
Second stage of 2SLS
Estimate second stage regression
Yi = β₀ + β₁X^i + vi (uses X^i , the fitted value from first stage)
Then we can estimate β^₁ 2SLS , now a consistent estimate!
And with one endogenous regressor e.gX ,and one instrument..
β^₁ 2LS = β^₁IV (same as the sample analogue)
2SLS with multiple variables and instruments e.g suppose a model
Yi =β₀ +β₁X₁i +β₂ X₂i +…+βkXki +εi
With m instruments
Where X₁ is endogeneous , the rest are exogenous.
What would the first step and second step equations be?
X₁ is a problem, as endogeneous i.e correlated with error cov(X₁,εi) ≠ 0, so focus on that
First stage equation
X₁ = γ₀ +γ₁ Z₁i +…+ γmZmi + ø₂ X₂i +…+ økXki + νi
(So ø for the well behaved exogenous variables, and γ for the instruments)
Second stage equation
Yi =β₀ +β^₁X₁i +β₂ X₂i +…+βkXki +εi
So same as basic model but with X^₁
What if R²zx is small?
Variance of IV estimator bigger than OLS estimator
Recall equations √σ²/TSSx and √σ²/TSSx R²zx
So a disadvantage, but makes up for it as removes endogeneity and bias. If an INSTRUMENT is poor, worse!
So if instrument poor, large issues. Why?
Use probability limit of IV estimator to show:
plimβ^₁IV = β₁ + Cor(Z,ε)/Cor(Z,X) x σε/σX
If instrument is weak i.e 1st assumption cov(Zi,X) is small, even small violations of the 2nd assumption cov(Z,ε)=0 creates large inconsistencies!
(So if Z has low instrument relevance (not highly correlated with X, and slightly correlated with the error, breaking instrument exogeneity assumption = large inconsistencies)
How to test for endogeneity (so far we just assumed the variable is endogeneous)
Durbin-Wu-Hausman
Durbin Wu Hausman test for endogeneity
Setup same as 2SLS: suppose model
Yi =β₀ +β₁X₁i +β₂ X₂i +…+βkXki +εi
With m instruments
Where X₁ is endogeneous , the rest are exogenous.
Obtain residuals v^i from first stage equation.
v^i = X₁i - [γ^₀ +γ₁^ Z₁i +…+ γmZmi + ø₂ X₂i +…+ økXki]
Then include into our original model (add πv^i)
Yi =β₀ +β₁X₁i +β₂ X₂i +…+βkXki +εi + πv^i
Hypothesis test (t test)
H₀:π=0 (X is exogenous)
H₁:π≠0 (X₁ is endogenous, Xi correlated with ε)