Lecture 12 Flashcards

1
Q

When does endogeneity occur?

A

When the x term is correlated with the error term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When could cov(x,u) =/0 arise?

A
  • omitted variables
  • simultaneity, so dependent and independent are determined simultaneously, so there is a feedback loop - like price n quantity in Supply and Demand diagrams
  • measurement error means observed x deviates from true x, which might cause correlation with u
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of simultaneity

A

Let’s say we want to see if gov spending causes less unemployment
- but gov often spends more in areas with higher unemployment
- therefore unemployment itself influences gov spending
- thus, if we ignore reverse causality, could misinterpret the correlation, like seeing higher spending and unemployment which indicates a positive correlation - misleading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unbiasedness does not mean consistency

A

Unbiasedness applies to small samples, so an estimator is unbiased if on average it hits the true value of the parameter in repeated samples. E[u|x] = 0

Consistency applies to large samples, so estimator is consistent if as the sample size grows infinitely large, estimates converge to true value. Cov(x,u) = 0

SLR.4 implies cov(x,u) = 0, but not vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Whats the basic idea of instrumental variables

A

Introduce a 3rd variable z, which affects x but not u, helps isolate the variation in x which is exogenous to u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

IV assumptions
- consider yi = B0 + B1Xi + ui, where cov(u,x) is not 0

A
  • cov(zi,ui) = 0, condition here is theoretical and cant be tested as it depends on ui - which is unobservable
  • cov(zi,xi) is not 0
    Basically z is unrelated to u, and z affects yi only through xi
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to test for instrument relevance

A

Xi = pi0 + pi1zi + vi
Since pi1 = cov(zi,xi)/var(zi), we MUST test relevance

Perform t tests
- H0: pi1 = 0
- H1: pi1 is not 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

IV estimator, B1

A

B1 = Cov(zi,yi)/cov(zi,xi), then divide top and bottom by var(zi)
- gives you slope coefficient estimator from the reduced form divided by the slope coefficient estimator from the first stage
B1^ = OLS estimator, but with z instead of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Special case of IV: Wald Estimator

A

When the instrument z is binary:
- E[yi|zi=1] = B0 + B1E[xi|zi=1]
- E[yi|zi=0] = B0 + B1E[xi|zi=0]

E[yi|zi=1] - E[yi|zi=0] = B1(E[xi|zi=1] - E[xi|zi=0])

Rearrange for B1 to have the Wald Estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example of where Wald Estimate has been used

A

Ln(wagei) = B0 + B1educi + ui
- quarter of birth affects years of education and is uncorrelated with ui, so is a valid instrument

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Variance of IV estimator:
Var^(Biv^) =

A

(O^2^)/(SSTx(R^2))

  • the r^2 from a regression of xi on zi and an intercept
  • o^2^ = (1/n-2)SUM(ui^2^)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

IV vs OLS

A

Advantage of IV: consistent even if u and x are correlated, in which case, the OLS estimator is biased and inconsistent

Disadvantage of IV estimator: less efficient if u and x are uncorrelated

Variance of the IV estimator is always larger than the variance of the OLS estimator and depends crucially on the correlation between z and x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Weak instruments and Bias:

A
  • weak instrument means that z and x are only weakly correlated, so leads to imprecise IV estimates, but also can give large bias
  • ## mathematically, if the denominator is small, so a weak instrument, the second term becomes very large - representing a lot of bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the rule of thumb with weak instruments

A
  • if instruments are weak, sampling distribution is not well approximated by normal, even in large samples

RoT - F statistic above 10, same as t statistic above root 10 means its roughly strong enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IV in the MLR model

A

To consistently estimate all of the Bs, we use the sample analogs of the moment conditions:
- E[ui] = 0
- cov(ui,zi) = 0
- cov (ui,xi2) = 0
Where xi2 is the exogenous explanatory variable, unlike xi1

Solve 3 equations, 3 unknowns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

IV in the MLR, what happens between the exogenous and the endogenous explanatory variables?

A

Z must be correlated with x1, correlation must hold even after controlling for x2
- to verify validity of z as an instrument, perform t test when regressing zi and xi2 on xi1, with pi = 0 or not

Exogeneity condition is now: cov(zi,ui|xi2) = 0, meaning after controlling for xi2, zi should have no correlation with ui

17
Q

What about the 2SLS model?
- how is it different to what we have so far?
- why use it?
- what is the test for instrument relevance?

A
  • multiple instruments Z1,…Zn, so that first stage regression of endogenous variable x1 on them is longer
  • multiple instruments improve the precision of estimates and allow for overestimation tests
  • H0: pi1 = pi2 = … = pin = 0, F test across multiple instruments
18
Q

2SLS, step-by-step model

A
  1. Estimate the first stage regression, regressing the endogenous explanatory variable on the instruments and all the other exogenous explanatory variables
  2. Compute the predicted value of x1, xi1^
  3. Yi = B0 + B1xi1^ + B2xi2 + ei, regressing the outcome variable on xi1^, and all the other exogenous explanatory variables
19
Q

Potential issues with adding instruments

A
  • adding instruments with low predictive power in the 1st stage lowers the F statistic and exacerbates the bias in the 2SLS estimator
20
Q

Testing for endogeneity: Hausman test

A

H0: cov(xi1,u) = 0 and H1 the opposite
- in the null, both OLS and IV are consistent, IV only consistent in the alternative
0. 1st stage regression, vi is the residual, capturing part of xi1 not explained by Zi
1. Calculate the 1st stage residual, xi1 - xi1^ = vi^
2. Add vi^ to the regression model, and estimate by OLS
3. If xi1 is exogenous, vi^ should not be correlated with ui, so theta should be 0
4. Test this using a t test

21
Q

Difference between over-identification and just identified.
- why does this matter

A
  • if we have exactly as many instruments as endogenous variables, model is just identified
  • if we have more instruments than endogenous variables, the model is over-identified.
  • overidentification allows for validity testing - we can check whether instruments satisfy the exogeneity condition
23
Q

Testing overidentification

A
  1. Estimate the 2SLS regression and obtain residuals
  2. Regress residuals on all excluded instruments, and any other exogenous variables in the model - record the R^2 from this regression
  3. Null is that all Ivs are exogenous, run the chi squared test, with M-1 degrees of freedom - where M is the number of instruments
24
Q

Difference between LATE and ATE:

A
  • LATE is the effect of treatment on outcome for subgroup of individuals whose treatment status is affected by the instrument
  • ATE is the effect of treatment on outcome averaged across entire population
25
Q

When can LATE = ATE

Biv = E[B1,pi1]/E[pi1] = E[B1i]

A
  • when causal affects are homogenous, so all individuals have same treatment effect, so B1i = B1
  • instrument affects all individuals equally, LATE equals ATE as there is no subgroup variation in how Z influences X, so pi1i = pi1
  • when the heterogeneity in the TE and in the effect of the instrument are uncorrelated, E[B1ipi1i] = E[B1i].E[pi1i]