Selection on (Un)Observables: Selection Correction Flashcards

1
Q

Cluster-robust standard errors

A

Now, classic OLS makes two assumptions concerning this matrix: 1. that E[εi|X,D] = σ2 (equal variances, or homoscedasticity), 2. and that Cov[εi,εj|X,D] = 0, for all i ̸= j.
This latter assumption means that error terms (which are the deviations of the expected values of Y) for any two observations i and j, i,k = 1…6 are uncorrelated.

The problem now arises because the standard formulas (e.g. those used by Stata) to compute standard errors of the coefficients βˆ, δˆ assume that all η = 0. It does not affect βˆ, δˆ itself (no bias!). Fortunately, one can usually specify a ‘robust’ or ‘cluster robust’ option and it is all taken care of.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Binary outcome models

A

When my outcome Y is binary (either 0 or 1), then fit models ‘predicting’ the probability for individual i to have Y = 1.
Can assume the line on which these probabilities lie to take different functional forms.
- linear probability (OLS) assumes a straight line
- probit assumes the CDF of a normal distribution
- logit assumes the CDF of a logistic (very similar to normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Tobit Model

A

suitable for censured data. Essentially assumes my X to affect two things:
- likelihood of Y > 0
- value of Y provided that (or ‘conditional on’) Y > 0.
Predicted probabilities of a tobit are therefore
E[y|x] = Pr(y > 0|x) E[y|y > 0, x],

> not straight forward to interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Problems with Sample selection

A

Estimating effects based on a sample that is not randomly drawn from the population can produce bias. Systematic selection into the sample on which data is available is such a case (‘sample selection’).
Examples: we only observe wages of people who actually work.

  • migration on earnings (decision to migrate likely to be driven by unobserved factors that also determine pay)
  • family holiday expenditure (number of kids affects decision to go on holiday, and how much is spent once on holiday)
  • institutions (decision to adopt certain institutions depends on factors that also matter for their effect once they are adopted)
    Important: Difference to selection of treatment assignment is that here, selection determines whether we actually observe Y for certain subjects at all.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample selection bias

A

The bias arises through the error term (i.e. unobserved factors).
Take the education-wage example:
- lowly-educated people are most likely to have a job if they have good other skills
- such skills are usually unobserved, i.e. part of the error of our model
- and they affect the wage, which is the outcome Y
→ sample contains systematically more people with high unobserved skills
→ OLS (or any other uncorrected model) ends up producing biased coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample Selection in causal graph

A
  • sample selection problem different from ‘causal situations’ looked at so far
  • here we actually want to know the effect of some X (education) on Y (wage), not the causal effect of D
  • problem is Y being unavailable if D = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Selection correction (basic idea)

A
  • explicitly model the selection process (‘selection stage’ or ‘selection equation’)
  • yields an estimate of the likelihood of every observation to be in the sample
  • this information is used to calculate the so called inverse Mills ratio (IMR)
  • in a separate equation we model the outcome of interest (‘outcome equation’)
  • including the IMR as a variable corrects for selection bias
  • think of the IMR as the correlation between error in the selection equation, and the error of the outcome equation without selection correction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Selection correction (mathematical) 1

A

consider first the selection equation:
Pr(di =1)=Φ(Zi +β)+εi,
which determines whether we observe a wage for individual i or not (di = 0, 1).
- Z is a set of independent variables, here including education level
- β is a coefficient vector
- ε is the error term of the selection equation
Consider now the wage equation:
w =α+Xγ+u, iii
α is a constant, X contains all or a subset of variables in Z , γ is a coefficient vector, u is the error term of the wage equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Slection correction (mathematical) 2

A
  • estimate the wage equation for all observations with a wage (all of whom have s = 1) then our γˆ is biased.
    I.e., Cov (εi , ui )≠ 0, implying that also Cov (εi , Yi ) ≠ 0, which violates an assumption essential for unbiased estimation. However, if we now
    1. estimate the selection equation to obtain a βˆ
    2. calculate the so called inverse Mills ratio: IMR = ρ = φ(Zi βˆ) / Φ(Zi βˆ) 3. and estimate the wage equation with this as a variable, that is wi =α+X′γ+ρρ+ui,
    the resulting γˆ is consistent (i.e. unbiased with large samples).
    ⇒ Intuition: in a non-randomly selected sample the IMR is an omitted variable, and inclusion of it takes out the omitted variable bias.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confounders of causal sector effect estimation

A
  • wage determination processes may differ between the sectors
    > differences in returns to skil; differences in regulation; trade-off between high pay and job security; symbolic rewards and intrinsic motivation may be substitutes for pecuniary rewards
  • most importantly, employees self-select into sectors according to
    > preferences over high pay versus job security, symbolic versus monetary rewards, etc.
    > their anticipated net utility in either sector (i.e. expected returns minus expected effort)
    > trade-off between high pay and job security
    > symbolic rewards and intrinsic motivation may be substitutes for pecuniary rewards
    ⇒ Many of these factors are generally unobserved and affect wages.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Roy model (aka ‘endogenous switching regression model’)

A

Consider a sector selection equation (public sector D = 1, business sector D = 0),
D = 1 if (log w1 −log w0)+Z′βS +εS

D = 0 if (log w1 −log w0)+Z′βS +εS

Rewrite this as a binary outcome model:

Pr(D = 1) = F[Z,βS,(log w1 − log w0),εS].

Consider further sector wages to be determined as follows:
log w1 = X′γ1 + u1, l

og w0 = X′γ0 + u0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Features of Roy setup

A
  • the wage and selection equations are mutually dependent, i.e.
  • Di indicates which wage equation determines the wage of i
  • at the same time, the sector choice of i, Di , depends on log wi1 − log wi0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sector wages: why not run 2 OLS?

A
  • problem: if we were to OLS-estimate a wage equation each for all public sector and all private sector employees
  • we’d have bias for the same reasons as in the Heckman model
  • suppose public sector jobs are sought-after, and generally highly-educated work there
  • some less-educated may also manage to get a public sector job, the ‘causes’ for this are most likely unobserved and thus in the error εS
  • since the same ‘causes’ usually affect wages, Cov (u0 , εS ) and Cov (u1 , εS ) ̸= 0
  • however, separate OLS-estimation of the wage equations implicitly assumes that
    Cov(u0,εS) = Cov(u1,εS) = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you correct for sample selection in sector wages?

A

include the inverse Mills ratios, ρi , obtained from the selection equation, as an additional variable with coefficient ρ in the wage equations:
log w1 = X′γ1 +ρ1ρ1 +u1,

log w0 = X′γ0 +ρ0ρ0 +u0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Interpret Roy results

A

Having unbiased coefficient estimates γˆ and ρˆ …
- … we can predict the public and private sector wages for particular value combinations of X
- for example, if in X we have age, schooling, and gender, we can predict wages log wˆ , log wˆ for a 30-year old, female, with Abitur in both sectors
10
⇒ the difference log(wˆ |X) − log(wˆ |X) is the effect (in percent, because of the 10
logs) on the wage of a switch from the private to the public sector for a person with characteristics X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 possibilities of estimation using Heckman or ROy

A
  1. sequentially
  2. 1 a reduced-form selection equation
  3. 2 the selection-corrected wage equations
  4. 3 finally, the selection equation with endogenous wages
    - pro: more transparent / intuitive
    - pro: can get a coefficient on log wˆ0 − log wˆ1
    - con: standard errors too small (→ bootstrap)
  5. maximum likelihood (simultaneously; ‘Bayesian approach’)
    - pro: more efficient and accurate SEs
    - con: less transparent; need to formulate the likelihood function (can be tough)
    - con: harder to back out coefficient on endogenous wage differential
    - PRO: commonly accepted as superior → method used by van der Gaag et al. (1988)
17
Q

Remarks on selection correction

A

-used more widely in policy evaluation and labour economics
- ‘experts’ are never quite sure what it does. In particular, unresolved debate (between Nobel laureates and other leading economists!) over whether Heckman/Roy can
→ correct for selection on observables (X) only → or also for selection on unobservables
- consensus is that it works better with an ‘exclusion restriction’, i.e. Z contains more variables than X
- if these variables excluded from X are not affecting Y directly, the method becomes similar to instrumental variable estimation
- if X = Y ‘identification’ relies on strong assumptions about functional form
→ linearity and additivity of selection equation
→ normal distribution of errors of selection equation