Selection on Unobservables: Solutions with Panel Data Flashcards

Learn Advanced Quantitative Mathods

1
Q

What are the main feature of panel data

A
  • observe same subjects at different points in time (at least 2)
  • panel called balanced if we have the same time points for all subjects
  • balanced panels are easiest to handle
  • otherwise potentially attrition bias (e.g. non-random drop out over time)
  • important: panel data different from ‘repeated cross-section’
  • in RCS different sample at each point in time
  • the same individuals may be observed twice, but only by chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is a panel called balanced? Problems?

A

same data for all subjects - attrition bias (non-random drop out)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is panel data different from ‘repeated cross-section’?

A
  • in RCS different sample at each point in time - the same individuals may be observed twice, but only by chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

advantages of panel data

A
  • generally, more data is better
  • specifically, can not only exploit between-subject variation as in cross-section, but also within-subject variation (e.g. change of treatment status)
  • this allows to deal with some forms of bias from unobservables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Econometric challenges arising with panel data

A
  • serial correlation of errors - need for inclusion of lagged dependent variables? - if so, opens a range of new ‘problems’ (not covered in the course)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bias from unobservables and panel data

A

Cause of interest (treatment) not always an event with a given timing

  • often subjects decide whether to take treatment, and when to take it
  • decision often depends on observed factors X and unobserved factors U
    → if these also affect Y,we are back in a familiar world:non-random selection on unobservables
  • similarly, if unobservables affect level of Y and X
    → again a familiar problem: omitted variable bias from the unobservables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If c is constant across time for each subject (i.e. cit = ci for t = 1, 2, …T ), what are three common situations and models producing consistent estimates?

A
  1. pooled OLS 2. the random effects model 3. the fixed effects model

⇒ Most intuitive to think about these three situations in terms of the intercepts required for unbiased estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is pooled OLS consistent?

A
  • if ci are all the same,that is ci =c
  • c is captured by the constant α in
    y =α+X′β+δd +ε
    ⇒ pooled OLS is consistent
    (which means, just for clarification, one can re-index to yk = α + Xk′ β + δdk + εk .
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is random effects estimation appropriate?

A
  • if ci differ but are not systematically related to dit and xit
  • effects ci can be decomposed into mean component α = c ̄ and individual specific
    component vi = ci − c ̄
  • a new model with a composite error term, say uit = vi + εit can be formulated
    y =α+X′β+δd +u it it it it
    ⇒ random effects estimation by means of GLS offers a way to correct for the fact that the vi part in the error is not fully independent from y (explanation is technical)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is a fixed effects estimator needed?

A
  • if ci differ and are related to dit (and potentially also xit )
  • RE estimator is biased
    → reason is that the procedure by which RE-GLS estimates the intercept for each i
    picks up some of the effect of interest
  • need to either explicitly model the ci or make them ‘disappear’
    ⇒ fixed effects estimators achieve that
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does one implement a FE estimator?

A

a) including a dummy variable for each subject (explicit modelling of intercepts)
b) subtract each subject’s mean values of y , x , d and estimate as pooled OLS (makes ci disappear) (demeaning)
c) take first differences and estimate as pooled OLS (or cross section if T = 2) (also makes ci disappear) (first differencing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the three different FE estimators.

A

a) including matrix with a dummy variable for each subject y =x′β+δd +I′γ+ε
b) demeaning (yit −y ̄i)=(xit −x ̄i)′β+δ(dit −d ̄i)+(ci −c ̄i)+(εit −ε ̄i) noteherethatci =c ̄i andε ̄i =0 Remark: a and b identical except that a ‘costs’ more degrees of freedom.
c) first differencing (yit − yit−1) = (xit − xit−1)′β + δ(dit − dit−1) + (ci − ci) + ε ̃it Remark:c identical to a and b if T=2, and very similar for T >2. a and b require non-serially correlated errors, whereas c does not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Disadvantages of FE estimators (vs. RE)

A
  • subjects with time-invariant yi are ignored
  • cannot obtain coefficient estimates for time-invariant xi
    → intuitive reason:
    FE estimation identifies effect of X on Y solely from within-subject variation.
    → note also: FE estimators very sensitive to bias with short panels (T ∼ 10 or less)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

WHen are FE & RE estimators consistent?

A

FE and RE estimators only consistent if effect of unobservables constant for each i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you test RE assumption?

A

can be tested using the Hausman-test (the null is that RE is unbiased, if rejected use FE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of RE vs FE?

A

RE estimator is superior along these criteria:
- uses within-subject variation and between-subject variation
- can produce coefficients for time-invariant covariates xi
- but requires that unobserved factors ci are unrelated to xit and dit
- often untenable as an assumption
→ can be tested using the Hausman-test
(the null is that RE is unbiased, if rejected use FE)
→ note also: RE model not very suitable for data with few subjects

17
Q

Disadvantages of RE vs FE?

A

but requires that unobserved factors ci are unrelated to xit and dit note also: RE model not very suitable for data with few subjects

18
Q

Unobserved effects in panel data modelling

A

Suppose outcome Y is given by
E(Y|X,D) = α + X′β + δD and all X and D are exogenous, then δ can be estimated consistently by a regression y k = α + X k′ β + δ d k + ε k
because Cov(d,ε) = 0. (Note: k = 1, 2, …K indexes observations, not subjects; thus in panel data K = T × N.)
If, however, true model is such that Y is given by
E(Y|X,D) = α + X′β + δD + c,
where c is the effect on Y of unobserved factors, then the model we can estimate is yk =α+Xk′β+δdk +ck +εk.
Coefficients will be biased because ck will end up in the residual (then c + ε) and Cov (d , c + ε) ̸= 0.