Lecture 15 Flashcards

Question 1

Q

What is truncation?

Answer

A

Occurs when some observations are completely missing from your data because of how the sample was selected
- not just unobserved - entire data point is missing

Question 2

Q

How to use selection variable with a model which may include truncation:

Answer

A

Si = 1 if unit i is observed and = 0 otherwise
- si.yi = Bt.sixi + si.ui
- when si = 1 we have the normal model, otherwise 0 = 0
Running OLS on 2 is equivalent to running OLS only on those observations selected out of the n initial draws - PROVE.
- big concern is selection bias, if the selection process is related to the unobserved error term ui, then: E(ui|si = 1,xi) is not = 0, now OLS is no longer unbiased

Question 3

Q

What does selection bias in the truncated model set us up for?

Answer

A

Model the selection mechanism
Correct for the selection bias, using more advanced methods

Question 4

Q

Conditions to maintain consistency in OLS

Answer

A

E[su] = 0, on average, the selection mechanism s must not be correlated with the error u
- also need E[(s.xj).(su)] = E[s.xj].E[u] = 0
Stronger condition E[su|sx] = 0, means even after conditioning on selection and covariates, the selection-adjusted error has mean 0
Cov(siui,sixi) = E[si.ui|xi] - E[si.ui].E[si.xi]

Question 5

Q

So when does truncation not hurt OLS?

Answer

A

Only when the selection mechanism s is completely independent of the error u, even after controlling for x, that’s a strong condition and usually, it doesn’t hold - so we need corrections like Heckman’s

Question 6

Q

If selection is completely independent of both x and u

Answer

A

Then E(s.xj.u) = E(s).E(xj.u) = 0
- so OLS is consistent

Question 7

Q

If selection only depended on the covariates x, not on unobservables

Answer

A

S = s(x)
- then all variation in s is explained by x so:
E(u|s.x) = E(u|x) = 0 - can prove this, and thus E(su|sx) = s.E(u|sx) = 0
- makes OLS valid and consistent on the selected sample, even if we’re not seeing the whole data

Question 8

Q

Identity for independence:
- W is independent of Z iff?

Answer

A

P(W|Z) = P(W), or P(W,Z) = P(W).P(Z)
- then, E(WZ) = E(W).E(Z)

Question 9

Q

Simpson’s paradox:
- y = Bt.x + u, E(u|x) = 0
- assuming a linear conditional expectation of y given x, which is fundamental to OLS being valid

Answer

A

each group can have a positive relationship between x and y, but when you pool the data across all groups, overall regression line can be negative
happens when a latent variable is correlated with both x and y, not accounted for in the model
assumption E(y|x) = xt.B can fail, which breaks the mean independence assumption and the nice properties of OLS.

Question 10

Q

The source of truncation matters, two cases:

Answer

A

If truncation depends only on x, e.g. s = 1 if x1 > 2, then E(u|x,s) = E(u|x) = 0, OLS still consistent
If truncation depends on y, so e.g s = 1 if y < c, that’s truncation based on outcome, but selection rule now depends on the unobservable u, so E(s.xj.u) is no longer 0, sample is biased with respect to u

Question 11

Q

Truncation based on x vs y for OLS

Answer

A

Truncation based on x doesn’t break OLS, truncation based on the dependent variable makes OLS inconsistent as it correlates with the error term.

Question 12

Q

So now how to tackle the mode below for truncation based on the dependent variable:
- Y = Bt.x + u, u|x - N(0,o^2)
- we observe (xi,yi) only if y < ci

Answer

A

This is left truncation, only values of y below c are observed
- we want the density of y conditional on being observed, i.e. on yi < fix
- f(y|x,B,o)/F(ci|xi,B,o)
- where f. Is the normal PDF of y with mean Bt.x and variance o^2
- F. Is the normal CDF up to the cutoff ci
Corrects for the selection bias introduced by the truncation and allows us to construct a likelihood function using only the observed data, estimate B and o via MLE

Question 13

Q

Incidental truncation - what is it

Answer

A

We only observe y for some of the population, and whether we observe it depends on some other decision process, which may correlate with y
- e.g. if y is wages, we only observe data on wages for people who work, and that participation decision can depend on multiple factors, so our sample is no longer random

Question 14

Q

Setup for a sample selection problem

Answer

A

Outcome equation:
- y = Bt.x + u, E(u|x,z) = 0
Selection equation:
- s = 1[yt.z + v >/ 0], a latent index model, only if a linear function of z + v is positive.
E(y|z,v) = Bt.x + E[u|z,v]
- we assume u,v are jointly normal and independent of z
E(y|z,v) = Bt.x + pv

Question 15

Q

For the sample selection model, what happens when we only observe y if s = 1?

Answer

A

S = 1, means v >/ -Yt.z
- since v is standard normally distributed,
E(v|s=1) = k(yt.z), where k is the inverse mills ratio

E(y|z,s=1) = Bt.x + pk(yt.z), we recover unbiased estimates of B only if we control for the selection bias term k(yt.z)

Question 16

Q

Heckman correction process

Answer

Study These Flashcards

A

Estimate the selection equation, a profit to get y^
Compute k^(yt.z)
Include this as a regressor in the outcome equation only for the observed sample:
Yi = Bt.xi + pk(y^t.zi) + error
Run OLS on this new equation using only the selected observations

Question 17

Q

What’s going on conceptually in Heckman correction

Answer

Study These Flashcards

A

Correcting for the non-random selection into the sample
- by including k, you’re adjusting for the fact that the sample you’re estimating on is not representative of the full population, due to the selection mechanism.

Question 18

Q

So when is OLS consistent in the presence of selection

Answer

Study These Flashcards

A

we only observe y when s = 1, so whether OLS will be consistent on this selected sample depends on relation between u and v
Case 1: p = cov(u,v) = 0, independent so OLS is consistent in IM ratio is irrelevant
Case 2: p does not = 0, means selection mechanism is informative about outcome error, now use the profit model to estimate y

Question 19

Q

Why does the Heckman correction works

Answer

Study These Flashcards

A

when selection is endogenous, the conditional mean of u given selection is non-zero and depends on z
IM ratio captures this dependence
by including k(.), we control for that selection bias, turning a biased regression into a consistent one.

Question 20

Q

Caveats in truncation
- SEs need adjustment

Answer

Study These Flashcards

A

after doing the 1st step estimation of selection equation, so estimating y^ from the profit, use k(y^t.zi) in the second step
BUT, this 2 step introduces generated regressor bias, as k(.) is estimated not observed, so OLS SEs understate uncertainty unless you correct them, use robust

Question 21

Q

Caveats in truncation
- overlap between x and x, identification concerns

Answer

Study These Flashcards

A

Ideally, the set of variables in the selection equation z includes at least one variable not in x, called an exclusion restriction
- multicollinearity can occur if x = z, so include some variables in z that are excluded from x, improving identification.

Lecture 15 Flashcards

(21 cards)