Lecture 12 Flashcards

(28 cards)

1
Q

When does endogeneity occur?

A

When the x term is correlated with the error term

When cov(x,u) does not = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When could cov(x,u) not = 0 arise?

What could it mean?

A
  • omitted variables
  • simultaneity, so dependent and independent are determined simultaneously, so there is a feedback loop - like price n quantity in Supply and Demand diagrams
  • measurement error means observed x deviates from true x, which might cause correlation with u

Would mean the standard OLS b0^ and b1^ would not be consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of simultaneity

A

Let’s say we want to see if gov spending causes less unemployment
- but gov often spends more in areas with higher unemployment
- therefore unemployment itself influences gov spending
- thus, if we ignore reverse causality, could misinterpret the correlation, like seeing higher spending and unemployment which indicates a positive correlation - misleading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unbiasedness does not mean consistency

A

Unbiasedness applies to small samples, so an estimator is unbiased if on average it hits the true value of the parameter in repeated samples. E[u|x] = 0

Consistency applies to large samples, so estimator is consistent if as the sample size grows infinitely large, estimates converge to true value. Cov(x,u) = 0

SLR.4 implies cov(x,u) = 0, but not vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Whats the basic idea of instrumental variables

A

Introduce a 3rd variable z, which affects x but not u, helps isolate the variation in x which is exogenous to u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2 key IV assumptions
- consider yi = B0 + B1Xi + ui, where cov(u,x) is not 0

A
  • cov(zi,ui) = 0, exogeneity condition here is theoretical and cant be tested as it depends on ui - which is unobservable
  • cov(zi,xi) is not 0

Basically z is unrelated to u, and z affects yi only through xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to test for instrument relevance, i.e. that Zi affects xi

A

Xi = pi0 + pi1zi + vi
Since pi1 = cov(zi,xi)/var(zi), we MUST test relevance

Perform t tests
- H0: pi1 = 0
- H1: pi1 is not 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

IV estimator, B1

A

First: cov(Zi,yi) = B1.cov(Zi,xi) + cov(Zi,ui), second is = 0 as we’ve assumed
- rearrange:
B1 = Cov(zi,yi)/cov(zi,xi), then divide top and bottom by var(zi)
- gives you slope coefficient estimator from the reduced form divided by the slope coefficient estimator from the first stage
- same as (Slope from regressing y on z) / (slope from regressing x on z)
B1^ = OLS estimator, but with z instead of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Special case of IV: Wald Estimator

A

When the instrument z is binary:
- E[yi|zi=1] = B0 + B1E[xi|zi=1]
- E[yi|zi=0] = B0 + B1E[xi|zi=0]

E[yi|zi=1] - E[yi|zi=0] = B1(E[xi|zi=1] - E[xi|zi=0])

Rearrange for B1, then in sample you replace the expectations with the sample averages to have the Wald Estimator

Basically, the difference in mean y between instrument groups devised by difference in mean x between instrument groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variance of IV estimator:
Var^(Biv^) =

A

Assuming homoskedasticity:

(O^2^)/(SSTx(Rx,z^2))

  • SSTx = sum(xi-x_)^2
  • the r^2 from a regression of xi on zi and an intercept, tells you how well z predicts x
  • o^2^ = (1/n-2)SUM(ui^2^)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

IV vs OLS

A

Advantage of IV: consistent even if u and x are correlated, in which case, the OLS estimator is biased and inconsistent

Disadvantage of IV estimator: less efficient if u and x are uncorrelated

Variance of the IV estimator is always larger than the variance of the OLS estimator and depends crucially on the correlation between z and x, can look at formulae to compare

Tradeoff: gain consistency at the cost of precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Weak instruments and Bias:

A
  • weak instrument means that z and x are only weakly correlated, so leads to imprecise IV estimates, but also can give large bias
  • mathematically, if the denominator is small, so a weak instrument, the second term becomes very large - representing a lot of bias

Denominator = how strongly z predicts x, if this tends to 0, estimator becomes unreliable and sensitive to small changes in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the rule of thumb with weak instruments

A
  • if instruments are weak, sampling distribution is not well approximated by normal, even in large samples

RoT - F statistic above 10, same as t statistic above root 10 means its roughly strong enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

IV in the MLR model

A

To consistently estimate all of the Bs, we use the sample analogs of the moment conditions:
- E[ui] = 0
- cov(ui,zi) = 0
- cov (ui,xi2) = 0
Where xi2 is the exogenous explanatory variable, unlike xi1

Solve 3 equations, 3 unknowns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IV in the MLR, what happens between the exogenous and the endogenous explanatory variables?

A

Z must be correlated with x1, correlation must hold even after controlling for x2
- to verify validity of z as an instrument, perform t test when regressing zi and xi2 on xi1, with pi = 0 or not

Exogeneity condition is now: cov(zi,ui|xi2) = 0, meaning after controlling for xi2, zi should have no correlation with ui

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What about the 2SLS model?
- how is it different to what we have so far?
- why use it?
- what is the test for instrument relevance?

A
  • multiple instruments Z1,…Zn, so that first stage regression of endogenous variable x1 on them is longer
  • multiple instruments improve the precision of estimates and allow for overestimation tests
  • H0: pi1 = pi2 = … = pin = 0, F test across multiple instruments, aim for F>10
17
Q

2SLS, step-by-step model

A
  1. Estimate the first stage regression, regressing the endogenous explanatory variable on the instruments and all the other exogenous explanatory variables, do relevance F test too
  2. Compute the predicted value of x1, xi1^
  3. Yi = B0 + B1xi1^ + B2xi2 + ei, regressing the outcome variable on xi1^, and all the other exogenous explanatory variables

Coefficient on xi1^ is the 2SLS estimate of B1
Get the SE and 1st stage F stat

18
Q

Potential issues with adding instruments

A
  • adding instruments with low predictive power in the 1st stage lowers the F statistic and exacerbates the bias in the 2SLS estimator, makes estimators tend to OLS estimates
19
Q

Testing for endogeneity: Hausman test

A

H0: cov(xi1,u) = 0 and H1 the opposite
- in the null, both OLS and IV are consistent, IV only consistent in the alternative

  1. Run the 2SLS 1st stage ,vi is the residual, capturing part of xi1 not explained by Zi
  2. Calculate the 1st stage residual, xi1 - xi1^ = vi^
  3. Add vi^ to the regression model, and estimate by OLS
  4. If xi1 is exogenous, vi^ should not be correlated with ui, so theta should be 0
  5. Test this using a t test
20
Q

Difference between over-identification and just identified.
- why does this matter

A
  • if we have exactly as many instruments as endogenous variables, model is just identified - exogeneity not testable
  • if we have more instruments than endogenous variables, the model is over-identified.
  • overidentification allows for validity testing - we can check whether instruments satisfy the exogeneity condition
21
Q

Multiple endogenous variables, what to do,
E.g., 3 regressors, but 2 endogenous

A

X1 and x2 potentially correlated with u
- need at least 2 instruments that:
1. Don’t appear in the main equation for y
2. Satisfy the relevance condition
3. Satisfy the exogeneity condition

Rank condition = instruments must be correlated ENOUGH with the endogenous variables so you can actually estimate the coefficients

22
Q

Testing overidentification, Sargan Test

A
  1. Estimate the 2SLS regression and obtain residuals ui^
  2. Regress residuals on all excluded instruments, and any other exogenous variables in the model - record the R^2 from this regression
  3. Compute nR^2, in a chi squared test, M-1 degrees of freedom, null is that all IVs are exogenous

If IVs are valid, 2SLS residuals should be uncorrelated with instruments

23
Q

Difference between LATE and ATE:

A
  • LATE is the effect of treatment on outcome for subgroup of individuals whose treatment status is affected by the instrument
  • ATE is the effect of treatment on outcome averaged across entire population
24
Q

When can LATE = ATE

Biv = E[B1,pi1]/E[pi1] = E[B1i]

A
  1. when causal affects are homogenous
    - everyone’s TE is the same, so weighting doesn’t matter - so B1i = B1
  2. Homogenous 1st stage, instrument affects all individuals equally, LATE equals ATE as there is no subgroup variation in how Z influences X, so pi1i = pi1
  3. when the heterogeneity in the TE and in the effect of the instrument are uncorrelated, E[B1ipi1i] = E[B1i].E[pi1i]
25
If IVs are valid - if statistically different, what inference can we draw?
You should get to B1 no matter which used in 2SLS - if different, one or more of the instruments are invalid - but you cant tell which one it is - if estimates are similar - doesn’t guarante validity
26
2SLS intuition and method
• Purpose: Fix endogeneity by using instruments Z. • Step 1: Regress endogenous X on Z → fitted values of X (clean part of X uncorrelated with u). • Step 2: Regress y on fitted X (and controls). • Result: Consistent estimate if instruments are valid (relevant + exogenous). • Why needed: When OLS fails due to endogeneity
27
Sargan test intuition + method
• When used: You have more instruments than endogenous regressors. • Idea: If instruments are valid, 2SLS residuals should be orthogonal to them. Steps: 1. Get 2SLS 2nd stage residuals. 2. Regress them on instruments (and controls). 3. Test whether instruments explain residuals (J = nR^2). • Interpretation: Fail to reject = instruments look valid. Reject = at least some instrument is invalid.
28
Hausman Test intuition + method
• When used: You suspect a regressor is endogenous. • Idea: Compare OLS vs IV (2SLS): • If regressor is exogenous, both are consistent → estimates should be similar. • If endogenous, OLS is biased but IV is consistent → estimates differ systematically. • Regression version: Add first-stage residuals to the model and test if their coefficient = 0. • Interpretation: Reject null → regressor endogenous → must use IV/2SLS.