11. Panel Data, I Flashcards
What is a panel data estimator and what kind of ommited vairable bias can it overcome?
The panel data estimator is a method for analyzing data that follows the same units over time (panel data/longitudinal data). It addresses omitted variable bias from unobserved time-invariant factors by controlling for unit-specific fixed effects
* If there are unobserved omitted variabels that are constant over time, then even if they are heterogenous across units, we can use panel data estimators to consistently estimate the effecout of the independent variable on the outcome.
Under which assumptions about the relationship between variables can we use the fixed effects panel method (See DAG)?
- The treatment variable Dit affects the outcome Yit at time t and infleunce Di,t+1 in the subsequent period.
- There exists a time-invariant unobserved confounder ui that affects both the treatment Dit and the outcome Yit. As a result, Dit is endogenous because ui is absorbed into the error term.
- There are no unobserved confounders that vary over time and are correlated with the treatment Dit.
- Past outcomes Yi,t−1 do not directly influence current outcomes Yit.
- Past outcomes Yi,t−1 do not directly affect current treatments Dit.
- Past treatments Di,t−1 do not directly influence current outcomes Yit.
These assumptions ensure that panel fixed effects models can isolate the causal effect of D (treatment) on Y (outcome) by controlling for time-invariant unobserved heterogeneity (ui).
What is the basic structure of panel data?
Panel data observes N units over T time periods.
* Outcome variable: Yit for unit i at time t.
* Explanatory variables: Dit=(Dit1, Dit2,…,DitK), a vector of K variables for unit i at time t.
* Balanced panel: Each unit has data for all T time periods.
What is the panel data regression model?
The model is: Yit=𝛿Dit+ui+𝜖it, t =1,2,…,T
* Yit: Dependent variable (log earnings for individual i in year t).
* Dit: Independent variable (schooling for individual
i in year t).
* 𝛿: Coefficient representing the effect of Dit on Yit
* ui: Time-invariant unobserved characteristics (ability).
* 𝜖it: Time-varying idiosyncratic error (random wage shocks).
We want to know what happens when we regress Yit on Dit. We can do it in two ways: Pooled OLS or FE.
What is the POLS estimator in panel data?
Pooled ordinary least squares (POLS):
The POLS estimator treats panel data as if it were a single large cross-sectional dataset (observation at a single point in time across units), ignoring the panel structure (repeated observations of the same units over time). It estimates the relationship between an outcome variable Yit and one or more explanatory variables
Dit using: Yit =δDit+ηit; t=1,2,…,T, whereηit=ui+εit
The main assumption for the POLS estimator to give consistent estimates of δ is:
* 𝐸[𝜂it∣𝐷i1,𝐷i2,…,𝐷iT]=𝐸[𝜂it∣𝐷it]=0 forallt
This means that the composite error term 𝜂it (and specifically ui) must be uncorrelated with the treatment Dit for all time periods.
Problems:
* In practice, the assumption that ui is uncorrelated with Dit often fails. This leads to omitted variable bias, making the estimate of 𝛿 unreliable.
* Additionally, the presence of ui causes serial correlation in the error term across time periods for the same unit. This correlation across time causes heteroskedastic robust standard errors to be too small, underestimating uncertainty.
What is FE (within estimator) and how does it work?
The Fixed Effects (FE) estimator is a method used in panel data analysis to estimate causal effects while controlling for time-invariant unobserved heterogeneity (ui). The generalized model is Yit=𝛿Dit+ui+𝜖it, t =1,2,…,T
The main challenge arises when ui, the unobserved heterogeneity, is correlated with Dit, leading to biased estimates of 𝛿 if ordinary least squares (OLS) is used.
To address this, FE estimation eliminates ui, through one of the following approaches:
1. Manually demeaning the variables (i.e., subtracting their time averages) and applying OLS to the transformed data.
2. Including unit-specific dummy variables in the regression to explicitly account for ui.
3. Using built-in fixed effects routines available in statistical software like Stata or R, which automate the process.
FE ensures consistent estimates of 𝛿 by isolating within-unit variation over time, effectively removing the influence of ui.
What is the mathematical formulation behind how the FE estimator works.
The goal of the Fixed Effects (FE) estimator is to estimate 𝛿, the treatment effect, while controlling for unobserved individual-specific heterogeneity (ui) that may be correlated with the explanatory variables (Dit). In traditional OLS regression, we estimate the parameters by minimizing the sum of squared residuals, which represents the difference between the observed and predicted values of the outcome variable.
The FE estimator builds on OLS by adding an additional layer to account for time-invariant unobserved effects (ui). In the FE model, we aim to estimate OLS estiamtion with fixed effects yields:
* Formula through which we obtain our parameters (objective function):
To find the values of 𝛿 and ui that minimize this objective function, we take the partial derivatives of the objective function with respect to 𝛿 and ui, and set them equal to zero, resulting in the first-order conditions (FOC).
* FOC for 𝛿:
* FOC for ui:
From the FOC for ui, we obtain an expression for ui as the average difference for unit i. We substitute this expression for ui into the FOC for 𝛿. This simplifies to the final equation for 𝛿, and solving it gives us the estimate of the treatment effect, while accounting for the unit-specific fixed effects ui.
* Mathematical solution for the optimal δ:
What are the identifying assumptions for the fixed effects (FE) estimator?
The fixed effects (FE) estimator relies on the following key assumptions for consistent and unbiased estimation of 𝛿
* Strict exogeneity: E[ϵit∣Di1,Di2,…,DiT, ui]=0,t=1,2,…,T. This means that the explanatory variables (Dit) must be uncorrelated with the idiosyncratic error (𝜖it), given ui. Once ui is accounted for (and no longer part of the error term), Dit and 𝜀it must not be correlated. While Dit can be arbitrarily related to
ui, it must neither influence nor be influenced by the idiosyncratic error 𝜀it. If this assumption is violated, the FE estimator will still suffer from bias due to unaddressed endogeneity.
* Rank condition: rank(∑E[D¨’itD¨it])=K. The demeaned explanatory variables (D¨it) must vary over time within at least some units (i) and must not be perfectly collinear. Without variation and the absence of perfect collinearity, the coefficient 𝛿 cannot be identified.
Under these assumptions, the FE estimator is consistent as N approached infinity and unbiased conditional on D.
Why must standard errors be clustered in fixed effects models?
In fixed effects models, standard errors must be clustered by panel unit to account for potential serial correlation in the error term (𝜖it) over time for the same unit. Clustering ensures valid inference by correcting for serial correlation and heteroskedasticity within units.
For clustering to yield valid inference, the number of clusters must be sufficiently large.
What are limitations of the fixed effects (FE) estimator in panel data?
- Reverse causality: Fixed effects cannot address reverse causality (the direction of causality being opposite to the assumed direction in the model) or simultaneity (the dependent and independent variables are determined together, creating a system where both variables influence each other simultaneously) bias, where the dependent variable (Y) affects the independent variable (D).
- Time-variant unobserved heterogeneity: Fixed effects only account for time-invariant unobserved heterogeneity (ui). If unobserved variables change over time and are correlated with the independent variable (Dit), fixed effects cannot address this, and it leads to omitted variable bias.