HANDOUT 9 Flashcards
If Y2i is NOT non-stochastic do we have an issue?
YES - this means Y2i is endogenous
Variables that determine attendance
Attend = f(quality, time, location, past-performance) + €2i
Where €2i = motivation, interest, ability = unobservables
What variables determine €1i
€1i = random luck + same unobservables
COV(€1i, €2i)
> 0
higher ability = higher performance
higher ability = higher attendance
COV(attend, €2i)
> 0 by definition
Therefore, COV(attend, €1i)
≠ 0 –> VIOLATES CLRM
Why can’t we use OLS when we have an endogenous variable?
As E(€1i I endogenous variable) ≠ 0 Therefore OLS is BIASED
Implication for OLS estimate of coefficient on attend
As COV(€1i, €2i) > 0 –> UPWARD BIAS
- OLS will overestimate the coefficient on attend
- We get a very positive significant coefficient on attend
- attend could actually have no effect on performance, but appears to have on driven by unobservables
IF COV(€1i, €2i) < 0, OLS –>
DOWNWARD BIAS
Solution to endogeneity
= IV estimation
Two stage least squares
What does IV estimation try to do?
We want a variable that looks like attendance, but it unrelated to €1i.
Replace attend by variables that determine attend but are unrelated to €1i.
2 components of attendance
- systematic - could be instruments
2. random = €2i = get rid of
two stages of IV
- regress attend on instruments that are relevant and exogenous & save fitted values
- Replace attend by attend^ in original regression equation
Is IV unbiased?
NO - but it is CONSISTENT
As n–>infinity, E(b3)–>B3
How can TIME of seminars be a valid instrument for attendance?
If time of seminars is randomly allocated by tabula = unrelated to motivation, interest and ability.
So include 1. Mon/Fri dummy and 2. 9am dummy
Why do we include the original exogenous variables female and a-level when we regress attend on its instruments?
We want the coefficient on attend hat to explain variation in perf over and above females and a-levels “holding all else constant”. If we didn’t include them, maybe the instruments are related to females and a-levels.
2 conditions for IV instruments
- Instrument relevance
2. Instrument exogeneity
What’s the issue if our instruments are only weakly correlated with the endogenous regressor?
Weak correlation –> IV coefficients inconsistent–> t-stats unreliable
Test statistic for weak instruments
B^IV = B + (B^OLS - B)/F
What is F in the test statistic for weak instruments?
F = f-stat from test of joint significance of the coefficients on the instruments when we regress the endogenous regressor on those instruments (+ exogenous variables)
H0: d3 = d4 =0
H1: d3/3 ≠ 0
Rule of thumb for F in the test statistic for weak instruments
F < 10 –> weak instruments
If F=1 in the test statistic for weak instruments
F=1 –> bias of order 100%
We are back to the biased OLS estimate
So IV estimation does NOT help - coefficients are inconsistent.
As F–>infinity in the test statistic for weak instruments
F--> infinity Instruments very significant Bias --> 0 B^IV --> B true coefficient So IV is CONSISTENT.
Instrument exogeneity means
E(€1i I instruments) = 0 Therefore E(€1i I attend^) = 0
Test for instrument exogeneity
J = mF m= number of instruments F = f-test of joint significance of instruments in equation of IV residuals on instruments and exogenous variables
F for instrument exogeneity
IV residuals = perf - (b0 + b1attend + b2female + b3 alevels)
Iv resid = d0 + d1 female + d2 a-level + gamma1 mon/fri + gamma 2 9am + Vi
H0: gammas = 0
- We want the instruments to be unrelated to the error term = exogenous
What distribution do we get CVs from for test of instrument exogeneity?
J follows Chi-squared with dof=m-k
m = no instruments; k = no endogenous regressors.
We can only carry out the test for instrument exogeneity IF…
m > k
Cannot have less than a chi-squared 1
So we need no instruments > no endogenous regressors.
H0 in J test
H0: instrument exogeneity
H1: instruments are INVALID
2 Problems with instrument exogeneity test
- Need m>k but hard to find 1 never-mind 2 instruments = most of the time CANNOT test, can only use reasoning.
- Test is LOW-POWERED = often tells us instruments are exogenous when they’re actually INVALID.
V(b1) OLS
sigma^2 / sum Y2i tilda^2
Y2i tilda = residuals from regression of Y2i on other exogenous variables X1i…
V(b1) IV
sigma^2 / sum Y2i^ tilda^2
Y2i^ tilda = residuals from regression of Y2i on other exogenous variables AND instruments
How do V(b1 OLS) and V(b1 IV) compare?
Variation you predict < the variation that actually happens.
So RSS IV < RSS OLS
So V(b1 IV) > V(b1 OLS)
SE IV > SE OLS
IV produces bigger se and so smaller t-ratios
Test for detecting endogeneity
Original OLS regression with Y1i on exogenous variables, the endogenous variable and e2i residuals from endogenous regressor (attend) equation on exogenous variables + instrument.
equation for e2i for our performance example. What is e2i a measure of?
e2i = attend - (d0 + d1 a-levels + d2female + d3mon/fri + d4 9am)
We strip out the systematic variation in attend = left with variation due to unobservables so e2i = best guess of motivation, interest, ability.
H0 in detecting endogeneity test
H0: delta = 0
- Coefficient on e2i = 0
- So perf is unrelated to unobservables = no endogeneity problem.