11. Panel Data, I Flashcards
What is panel data, and why is it useful for causal inference?
Panel data are longitudinal datasets in which the same units are observed repeatedly over multiple time periods. Panel data estimators are valuable for causal inference as they can address a specific type of omitted variable bias by controlling for unit-specific, time-invariant unobserved factors using fixed effects estimation.
What are the main components of a panel data regression model?
The panel data regression model is:
Yit=δDit+ui+εit, t=1,2,…,T where:
* Yit: Outcome variable for unit i at time t
* Dit: Treatment or independent variable
* ui: Time-invariant, unit-specific unobserved heterogeneity
* 𝜀it: Time-varying error term.
* 𝛿: Coefficient capturing the causal effect of Dit on Yit
What assumptions about relationships between variables are key in panel data fixed effects models? (!)
- The treatment variable Dit at time t directly affects the outcome Yit at the same time. Additionally, the treatment at one time period (Di1) can influence the treatment in the subsequent period (Di2).
- There exists a time-invariant unobserved confounder ui that affects both the treatment Dit and the outcome Yit. As a result, Dit is endogenous because ui is absorbed into the error term.
- There are no unobserved confounders that vary over time and are correlated with the treatment Dit. The only confounder is the time-invariant ui, referred to as unobserved heterogeneity (heterogenous across units but constant over time).
- Past outcomes Yi,t−1 do not directly influence current outcomes Yit
- Past outcomes Yi,t−1 do not directly affect current treatments Dit
- Past treatments Di,t−1 do not directly influence current outcomes Yit
These assumptions ensure that panel fixed effects models can isolate the causal effect of D (treatment) on Y (outcome) by controlling for time-invariant unobserved heterogeneity (ui).
What is the POLS estimator in panel data?
Pooled ordinary least squares (POLS):
The POLS estimator treats panel data as if it were a single large cross-sectional dataset, ignoring the panel structure (repeated observations of the same units over time). It estimates the relationship between an outcome variable Yit and one or more explanatory variables
Dit using: Yit =δDit+ηit; t=1,2,…,T, whereηit=ui+εit
The main assumption for the POLS estimator to give consistent estimates of δ is:
* 𝐸[𝜂it∣𝐷i1,𝐷i2,…,𝐷iT]=𝐸[𝜂it∣𝐷it]=0forallt
This means that the composite error term 𝜂it (and specifically ui) must be uncorrelated with the treatment
𝐷it for all time periods.
Problems:
* In practice, the assumption that ui is uncorrelated with Dit often fails. This leads to omitted variable bias, making the estimate of 𝛿 unreliable.
* Additionally, the presence of ui causes serial correlation in the error term across time periods for the same unit. This correlation across time causes heteroskedastic robust standard errors to be too small, underestimating uncertainty.
What is the fixed effects (within estimator) in panel data? (!)
Fixed Effects (FE):
The fixed effects estimator controls for time-invariant unobserved heterogeneity by focusing on within-unit variation over time. It eliminates unit-specific effects (ui) by subtracting individual means (time-demeaning).