Lecture 12 Flashcards by Learning Learner

When does endogeneity occur?

When the x term is correlated with the error term

When cov(x,u) does not = 0

How well did you know this?

Not at all

Perfectly

When could cov(x,u) not = 0 arise?

What could it mean?

omitted variables
simultaneity, so dependent and independent are determined simultaneously, so there is a feedback loop - like price n quantity in Supply and Demand diagrams
measurement error means observed x deviates from true x, which might cause correlation with u

Would mean the standard OLS b0^ and b1^ would not be consistent

How well did you know this?

Not at all

Perfectly

Example of simultaneity

Let’s say we want to see if gov spending causes less unemployment
- but gov often spends more in areas with higher unemployment
- therefore unemployment itself influences gov spending
- thus, if we ignore reverse causality, could misinterpret the correlation, like seeing higher spending and unemployment which indicates a positive correlation - misleading.

How well did you know this?

Not at all

Perfectly

Unbiasedness does not mean consistency

Unbiasedness applies to small samples, so an estimator is unbiased if on average it hits the true value of the parameter in repeated samples. E[u|x] = 0

Consistency applies to large samples, so estimator is consistent if as the sample size grows infinitely large, estimates converge to true value. Cov(x,u) = 0

SLR.4 implies cov(x,u) = 0, but not vice versa

How well did you know this?

Not at all

Perfectly

Whats the basic idea of instrumental variables

Introduce a 3rd variable z, which affects x but not u, helps isolate the variation in x which is exogenous to u

How well did you know this?

Not at all

Perfectly

2 key IV assumptions
- consider yi = B0 + B1Xi + ui, where cov(u,x) is not 0

cov(zi,ui) = 0, exogeneity condition here is theoretical and cant be tested as it depends on ui - which is unobservable
cov(zi,xi) is not 0

Basically z is unrelated to u, and z affects yi only through xi

How well did you know this?

Not at all

Perfectly

How to test for instrument relevance, i.e. that Zi affects xi

Xi = pi0 + pi1zi + vi
Since pi1 = cov(zi,xi)/var(zi), we MUST test relevance

Perform t tests
- H0: pi1 = 0
- H1: pi1 is not 0

How well did you know this?

Not at all

Perfectly

IV estimator, B1

First: cov(Zi,yi) = B1.cov(Zi,xi) + cov(Zi,ui), second is = 0 as we’ve assumed
- rearrange:
B1 = Cov(zi,yi)/cov(zi,xi), then divide top and bottom by var(zi)
- gives you slope coefficient estimator from the reduced form divided by the slope coefficient estimator from the first stage
- same as (Slope from regressing y on z) / (slope from regressing x on z)
B1^ = OLS estimator, but with z instead of x

How well did you know this?

Not at all

Perfectly

Special case of IV: Wald Estimator

When the instrument z is binary:
- E[yi|zi=1] = B0 + B1E[xi|zi=1]
- E[yi|zi=0] = B0 + B1E[xi|zi=0]

E[yi|zi=1] - E[yi|zi=0] = B1(E[xi|zi=1] - E[xi|zi=0])

Rearrange for B1, then in sample you replace the expectations with the sample averages to have the Wald Estimator

Basically, the difference in mean y between instrument groups devised by difference in mean x between instrument groups

How well did you know this?

Not at all

Perfectly

Variance of IV estimator:
Var^(Biv^) =

Assuming homoskedasticity:

(O^2^)/(SSTx(Rx,z^2))

SSTx = sum(xi-x_)^2
the r^2 from a regression of xi on zi and an intercept, tells you how well z predicts x
o^2^ = (1/n-2)SUM(ui^2^)

How well did you know this?

Not at all

Perfectly

IV vs OLS

Advantage of IV: consistent even if u and x are correlated, in which case, the OLS estimator is biased and inconsistent

Disadvantage of IV estimator: less efficient if u and x are uncorrelated

Variance of the IV estimator is always larger than the variance of the OLS estimator and depends crucially on the correlation between z and x, can look at formulae to compare

Tradeoff: gain consistency at the cost of precision

How well did you know this?

Not at all

Perfectly

Weak instruments and Bias:

weak instrument means that z and x are only weakly correlated, so leads to imprecise IV estimates, but also can give large bias
mathematically, if the denominator is small, so a weak instrument, the second term becomes very large - representing a lot of bias

Denominator = how strongly z predicts x, if this tends to 0, estimator becomes unreliable and sensitive to small changes in data

How well did you know this?

Not at all

Perfectly

What is the rule of thumb with weak instruments

if instruments are weak, sampling distribution is not well approximated by normal, even in large samples

RoT - F statistic above 10, same as t statistic above root 10 means its roughly strong enough

How well did you know this?

Not at all

Perfectly

IV in the MLR model

To consistently estimate all of the Bs, we use the sample analogs of the moment conditions:
- E[ui] = 0
- cov(ui,zi) = 0
- cov (ui,xi2) = 0
Where xi2 is the exogenous explanatory variable, unlike xi1

Solve 3 equations, 3 unknowns.

How well did you know this?

Not at all

Perfectly

IV in the MLR, what happens between the exogenous and the endogenous explanatory variables?

Z must be correlated with x1, correlation must hold even after controlling for x2
- to verify validity of z as an instrument, perform t test when regressing zi and xi2 on xi1, with pi = 0 or not

Exogeneity condition is now: cov(zi,ui|xi2) = 0, meaning after controlling for xi2, zi should have no correlation with ui

How well did you know this?

Not at all

Perfectly

What about the 2SLS model?
- how is it different to what we have so far?
- why use it?
- what is the test for instrument relevance?

Study These Flashcards

multiple instruments Z1,…Zn, so that first stage regression of endogenous variable x1 on them is longer
multiple instruments improve the precision of estimates and allow for overestimation tests
H0: pi1 = pi2 = … = pin = 0, F test across multiple instruments, aim for F>10

2SLS, step-by-step model

Study These Flashcards

Estimate the first stage regression, regressing the endogenous explanatory variable on the instruments and all the other exogenous explanatory variables, do relevance F test too
Compute the predicted value of x1, xi1^
Yi = B0 + B1xi1^ + B2xi2 + ei, regressing the outcome variable on xi1^, and all the other exogenous explanatory variables

Coefficient on xi1^ is the 2SLS estimate of B1
Get the SE and 1st stage F stat

Potential issues with adding instruments

Study These Flashcards

adding instruments with low predictive power in the 1st stage lowers the F statistic and exacerbates the bias in the 2SLS estimator, makes estimators tend to OLS estimates

Testing for endogeneity: Hausman test

Study These Flashcards

H0: cov(xi1,u) = 0 and H1 the opposite
- in the null, both OLS and IV are consistent, IV only consistent in the alternative

Run the 2SLS 1st stage ,vi is the residual, capturing part of xi1 not explained by Zi
Calculate the 1st stage residual, xi1 - xi1^ = vi^
Add vi^ to the regression model, and estimate by OLS
If xi1 is exogenous, vi^ should not be correlated with ui, so theta should be 0
Test this using a t test

Difference between over-identification and just identified.
- why does this matter

Study These Flashcards

if we have exactly as many instruments as endogenous variables, model is just identified - exogeneity not testable
if we have more instruments than endogenous variables, the model is over-identified.
overidentification allows for validity testing - we can check whether instruments satisfy the exogeneity condition

Multiple endogenous variables, what to do,
E.g., 3 regressors, but 2 endogenous

Study These Flashcards

X1 and x2 potentially correlated with u
- need at least 2 instruments that:
1. Don’t appear in the main equation for y
2. Satisfy the relevance condition
3. Satisfy the exogeneity condition

Rank condition = instruments must be correlated ENOUGH with the endogenous variables so you can actually estimate the coefficients

Testing overidentification, Sargan Test

Study These Flashcards

Estimate the 2SLS regression and obtain residuals ui^
Regress residuals on all excluded instruments, and any other exogenous variables in the model - record the R^2 from this regression
Compute nR^2, in a chi squared test, M-1 degrees of freedom, null is that all IVs are exogenous

If IVs are valid, 2SLS residuals should be uncorrelated with instruments

Difference between LATE and ATE:

Study These Flashcards

LATE is the effect of treatment on outcome for subgroup of individuals whose treatment status is affected by the instrument
ATE is the effect of treatment on outcome averaged across entire population

When can LATE = ATE

Biv = E[B1,pi1]/E[pi1] = E[B1i]

Study These Flashcards

when causal affects are homogenous
- everyone’s TE is the same, so weighting doesn’t matter - so B1i = B1
Homogenous 1st stage, instrument affects all individuals equally, LATE equals ATE as there is no subgroup variation in how Z influences X, so pi1i = pi1
when the heterogeneity in the TE and in the effect of the instrument are uncorrelated, E[B1ipi1i] = E[B1i].E[pi1i]

If IVs are valid - if statistically different, what inference can we draw?

You should get to B1 no matter which used in 2SLS - if different, one or more of the instruments are invalid - but you cant tell which one it is - if estimates are similar - doesn’t guarante validity

2SLS intuition and method

• Purpose: Fix endogeneity by using instruments Z. • Step 1: Regress endogenous X on Z → fitted values of X (clean part of X uncorrelated with u). • Step 2: Regress y on fitted X (and controls). • Result: Consistent estimate if instruments are valid (relevant + exogenous). • Why needed: When OLS fails due to endogeneity

Sargan test intuition + method

• When used: You have more instruments than endogenous regressors. • Idea: If instruments are valid, 2SLS residuals should be orthogonal to them. Steps: 1. Get 2SLS 2nd stage residuals. 2. Regress them on instruments (and controls). 3. Test whether instruments explain residuals (J = nR^2). • Interpretation: Fail to reject = instruments look valid. Reject = at least some instrument is invalid.

Hausman Test intuition + method

• When used: You suspect a regressor is endogenous. • Idea: Compare OLS vs IV (2SLS): • If regressor is exogenous, both are consistent → estimates should be similar. • If endogenous, OLS is biased but IV is consistent → estimates differ systematically. • Regression version: Add first-stage residuals to the model and test if their coefficient = 0. • Interpretation: Reject null → regressor endogenous → must use IV/2SLS.

Lecture 12 Flashcards

(28 cards)