Endogeneity & Instrumental Variables Flashcards
In general, when can Xi be endogenous (upon what assumptions)
B) what does this mean for our estimate of β₁ (β^₁)?
Usually zero conditional mean assumption exists
E(εi|Xi) = 0
This is violated if Xi and εi covary! i.e
Cov (Xi,εi) ≠ 0
(E.g high X associated with high ε)
B) Then Xi is endogenous!!!!
So E(β^₁) ≠ β₁
So biased!
Thus omitted variables are a source of endogenity
E.g we estimate
Yi =β0 +β1 Xi +εi
But omit relevant variable Zi
When would endogenity occur
When Cov(Xi,Zi) ≠ 0 and β₂ ≠ 0 ,
(Usual bias for omitted variables)
Cov (Xi,εi) ≠ 0 , so endogenous!!!
Measurement errors are also a source of endogenity:
Explain (Assume independent variable with error!)
Do normal steps of estimating β₁ , we get
Yi = β₀ + β₁X*i + εi − β₁ui
Where εi − β₁ui = μi (measurement error and normal error)
Since Cov(Xi,ui) ≠ 0 , Cov (Xi,μi) ≠ 0
So endogeneity
3rd source of endogeity - autocorrelation
Consider a time series model with a RHS lagged dependent variable: Yt =β₀ +β₁Yt−1 +εt
With error term
εt = ρεt−1 + νt
How can we establish endogeneity
It must be the case
Cov(Yt-1,εt-1) ≠ 0 (variable in past is linked to past error)
Cov(εt,εt-1) ≠0 (error is linked to past error)
Therefore
Cov(Yt-1, εt) ≠ 0
(Independent variable is correlated with error!)
So endogeneity!
4th source of endogeneity - Simultaneity (reverse causality)
What is meant by this and example
Y is a function of X , and X is a function of Y
E.g demand (price depends on quantity, and quantity depends on price)
So 4 sources of endogeneity
Omitted variable
Measurement error
Autocorrelation
Simultaneity
Main reason endogeneity is bad
OLS estmimates are bias
How to solve endogeneity
Break correlation between Xi and ε - isolate the part of the variation within Xi that is uncorrelated with the error!
using an instrumental variable!
2 assumptions we make for instrumental variables
Instrument relevance (instrument is highly correlated with X, and relevant for variation in X)
Instrument exogeneity - instrument is uncorrelated with εi
Simple model Yi =β₀ +β₁ Xi +εi
But Cov(Xi, εi)≠0 endogeneous, so a bias β₁ estimate
Now add instrumental variable Zi
How can we use instrument to identify β₁?
Take covariances with respect to Z on both sides
Yi =β₀ +β₁ Xi +εi Turns into
Cov(Zi,Yi) = Cov(Zi,β₀) + Cov(Zi,β₁Xi) + Cov(Zi,εi)
Then since instrumental variable is uncorrelated with error and β₀:
Cov(Zi,Yi) = β₁Cov(Zi,Xi) (and take β₁ out bracket)
Finally solve for β₁
β₁ = Cov(Zi,Yi)/Cov(Zi,Xi)
What does this suggest
A sample analogue
B^₁IV : instrumental variable estimator of β₁
βˆ₁IV = Σ(Zi −Z ̄)(Yi −Y ̄)/Σ(Zi −Z ̄)(Xi −X ̄)
β^₁IV : properties
Consistent, but may be bias in small samples (unbias with large samples
Standard errors of IV and OLS
SE of IV is greater than OLS i.e less efficent (however still good since turned bias into unbias!)
OLS: √σ^²/TSSx
IV: √σ^²/TSSx x R²zx
R²zx is < 1!
So instruments are useful in turning endogeneous/bias estimates into non bias i.e β^₁IV
How can we optain Biv in practice? (2)
Sample analogue - as shown (for 1 instrument and one endogenous variable)
Two stage least squares (2SLS) for multiple variables or instruments
4 equations we use
Basic model Yi =β₀ +β₁ Xi +εi
First stage Xi = γ₀ + γ₁Zi + vi (since first thing we estimate)
Reduced form Yi = λ₀ + λ₁Zi + ui
Second stage Yi = β₀ + β₁X^i + vi (fitted value from first stage)
Steps to 2SLS - first stage
Estimate “first stage” equation by OLS and get fitted values
X^i = γ^₀ + γ^₁Zi
Hypothesis test
H₀: γ₁ = 0 (Z is relevant)
H₀: γ₁ ≠ 0 (Z relevant!)
Note: critical region is t>3.16 , higher than normal 1.96 (need strong instruments that can reject and be relevant!)