Tutorial 4 - Frisch-Waugh Theorem, Omitted Variables, Instrumentral Variables Estimation Flashcards

1
Q

Say we are interested in β₂: the effect of x on y, after controlling for w (=holding w constant).

How can we obtain the estimator ^β₂ according to Frisch-Waugh?

A
  1. Regress y on a constant and w. Take the residuals from this regression -> yres,w (the variation in y that cannot be explained by w)
  2. Regress x on a constant and w. Take the residuals from this regression -> xres,w (the variation in x that can not be explained by w)
  3. Regress yres,w on xres,w. The coefficient of xres,w is the same as the coefficient of x in the multivariate regression

This procedure illustrates the idea of filtering out the effect of w when estimating the effect of x on y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which conditions are met when the OLS estimator ^β is unbiased?

A

Unbiased: E(^β) = β

Note: If it is unbiased, that also means it is consistent:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does this mean?

A

The OLS estimator is consistent:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you apply the Frisch Waugh theorem here?

A
  1. Regress log(invest) on a constant and trend and take residuals (“de-trended housing investment”)
  2. Regress log(price) on a constant and trend and take residuals (“de-trended house prices”)
  3. Regress the de-trended investment on de-trended prices.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Show that under the assumption of mean independence of the error term given the regressors, the OLS estimator is unbiased!

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Show that the OLS estimator is consistent (under the assumption that the regressors and the error term are uncorrelated)!

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is the OLS estimator unbiased?

A

if E(ϵ|X) = 0

i.e. if the error term and the regressors are mean independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is the the OLS estimator consistent?

A

if Cov(x, ϵ) = 0,

i.e. if the error term and the regressors are uncorrelated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are unbiasedness and consistency related with each other?

A

Note that mean independence implies uncorrelatedness, but not vice versa.

Thus: if the OLS estimator is unbiased, it is also consistent. But if the OLS estimator is consistent, it need not be unbiased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you call regressors that are correlated with the error term, Cov(x, ϵ) ≠ 0, i.e. where the OLS estimator will be inconsistent and biased?

A

the regressors x are endogenous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why can regressors be endogenous?

A
  • Omitted variables: x and y are both driven by a third, unobserved, variable
  • Simultaneity: y causes x, rather than vice versa
  • Measurement Error: x is measured with error
  • Dynamic Models with Serially Correlated Errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which conditions does an instrumental variable z have to fulfill?

A
  1. Instrument relevance
  2. Instrument exogeneity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “instrument relevance”?

A

The instrument has to be correlated with the endogenous variable Cov(zₖ, xₖ ) ≠ 0.

In the present case with multiple covariates, a stronger condition is that in the regression below, the parameter 𝛿ₖ ≠ 0, i.e. the instrument zₖ has to be correlated with xₖ also when controlling for all exogenous covariates!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is instrument exogeneity?

A

The instrument has to be uncorrelated with the error term. It has to affect the outcome only via the endogenous variable, i.e. it should not have a direct effect on the outcome:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you test relevance of the instrument with instrumental variables?

A

It can be tested by estimating the first stage regression (below) and use a usual t-test for H₀: 𝛿ₖ = 0. -> the more significant, the better.

For multiple instruments z₁, …, zₗ: use a test of joint significance of the instruments in the first stage. Rule of thumb: F-statistic >10 to have a “strong” instrument

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you test exogeneity of the instrument with instrumental variables?

A

It is not testable, has to be justified based on economic theory.

17
Q

What is a Two-Stage Least Squares (2SLS) Estimator? What is the first stage?

A

It is a way to carry out IV estimation:

  1. First Stage: Regress endogenous variable xₖ on all exogenous covariates 1, x₂, …, xₖ₋₁ and instrument zₖ, and form fitted values ^xₖ:
  • ^xₖ contains only exogenous variation!
  • It is important to include the exogenous covariates as well because the instrument has to affect the endogenous variable, conditional on the exogenous covariates
18
Q

With a Two-Stage Least Squares (2SLS) Estimator: What is the second stage?

A

Second Stage: Regress the outcome variable y on all exogenous covariates and the fitted values^xₖ from the first stage -> coefficient on ^xₖ is the IV Estimator

19
Q

What are practical issues with IV?

A
  • Often standard errors for IV are much higher than for OLS (bias-variance tradeoff), in particular if there is a weak instrument problem, i.e. only a weak correlation between the instrument and the endogenous variable
    • Always check the first stage regression!
  • If the instrument is correlated with the error term, the IV estimator is not consistent. The inconsistency of the IV estimator might be even larger than for the OLS estimator…
20
Q

Which test can you use to test for endogeneity of regressors?

A

Hausman test for endogeneity of regressors

21
Q

What are the hypotheses with the Hausman test for endogeneity of regressors?

A
22
Q

What is the idea of the Hausman test (null-hypothesis)?

A

Under the null hypothesis (xₖ is exogenous), the OLS and 2SLS estimators are both consistent, and should differ only by sampling error. A statistically significant difference between the two estimators would be seen as evidence against exogeneity of xₖ .

23
Q

What is the test statistic and distribution for the Hausman test?

A

Under the H₀ of exogeneity, the Hausman statistic is asymptotically Xₖ² -distributed.

24
Q

What are the limitations of the Hausman test?

A
  • It assumes that the instrument is valid. Otherwise, a significant difference between OLS and IV could be driven by the fact that IV is inconsistent, which will not necessarily imply that xₖ is endogenous!
  • Even in the case of a valid instrument, the test might have low power if the IV estimates are very imprecise (standard errors are high). In this case, the test might fail to reject the H₀ of exogeneity even if xₖ is endogenous.
25
Q

What does the “Sargan test” test?

A

Overidenti cation test for validity of instruments

26
Q

What can you do with an overidentified model?

A

overidentification test for validity of instruments

27
Q

What are the hypotheses and idea behind the Sargan test?

A

Idea: under H0, the instruments should be uncorrelated with the 2SLS residuals.

28
Q

What is the procedure of the Sargan test?

A
  1. Estimate the model using 2SLS and take residuals.
  2. Regress 2SLS residuals on all exogenous covariates plus instruments. Compute the test statistic, N * R²
  3. Under H0, the test statistic is asymptotically X² with q degrees of freedom (q= # instruments - # endogenous variables)
29
Q

What could be problems with the Hausman test?

A

It is based on the assumption that at least one of the instruments is exogenous! (e.g., if all instruments are biased in the same direction, the test might not reject H0!)

30
Q

What is a spurious correlation?

A

a statistical relationship between two variables appears to be causally related, but only appears so by coincidence or due to the role of a third, intermediary variable