Lecture 10 (Panel data) Flashcards
What is the benefit of using panel data?
We can solve OVB that is either different across units but constant over time or constant among units but that changes over time.
What do we need for consistent estimates in panel data when we assume that the error term is vit = ai + uit?
Using OLS, for a consistent estimate of $\beta$ we need that OLS.1$E[x’{it} v{it}] =0$ holders This can be divided into two parts.
$$
E[x’{it}u{it}]= 0 \
E[x’{it}a{i}]= 0
$$
where the second part is our main concern.
What is the random effect estimator?
See notion.
Explain the assumptions of RE and when one should use this one instead of FE etc.
The two key assumptions for the RE model are:
$$
E[u_{it}|x_i,a_i]= 0 \
E[a_i|x_i]= 0
$$
That is, we need the independence of $x$ in all periods (strict exogeneity) and that $E[a_i|x_i]= E[a_i] = 0$.
If we do believe these assumptions, we will have an efficiency gain by using this RE-estimator compared to pooled OLS or FE/FD.
If we don’t believe the second assumption (that $a_i$ not is a problem or that we can condition away the problem), we will have a biased estimate and should go for a FE or FD-estimator.
This is thus an efficiency/consistency trade-off. In fact, the RE estimator can be expressed as an “efficiently” weighted average of within (FE) and between estimators.
The full rank condition also applies.
Explain what a FE model and what the assumptions are.
This refers to running a “demeaned” regression. FE is a within estimator as it exploits within-unit variation for identification.
We should then think of $a_i$ as an individual-specific intercept. Fixed effects (FE) is a way of controlling for this cofounder by eliminating it by averaging over time and then estimating a transformed model where we subtract the average.
We can off-course do this demeaning for both unit- and time-fixed effects. Doing both refer to two-way-fixed effects regression (TWFE).
Assumptions:
- Linearity (? see the beginning of slides)
- Strict exogeneity
- $E[u_{it}|x_{it},a_{it}]=E[u_{it}|x_{i1},…,x_{iT},a_{it}]=0$
- Implying that $E[\tilde x_{it}’\tilde u_{it}]=0$$E[\tilde x_{it}’\tilde u_{it}]=0$
- Full rank - $k$
- $\text{rank} \ E[\tilde X_i’ \tilde X_i]=k$
Under these assumptions, $\hat \beta_{FE}$ is an unbiased estimator of $\beta$.
Derive the FE estimator. Also show the estimator looks in matrix notation
See notion.
What are the caveates with FE and FD?
- We get rid of all time-invariant observables (mechanically by the transformation)
- We drop observations without within-variation in $x_{it}$
- We will have power and measurement error issues
- Demeaning an interaction term is not the same as interacting demeaned variables
- We will have age-time-cohort collinearity problems
What do we mean with strict exogeneity?
Strictly exogenousmeans the error term $u$ is unrelated to any instance of the variable x; past, present, and future. $x$ is completely unaffected by $y$.
E[u_{it}|x_{i1},…,x_{iT},a_{it}]=0
What is the difference between the FE approach and the dummy approach for a_i?
The dummy approach is in fact equivalent to the FE-estimator! However, it is rarely computationally feasible since we might end up with a lot of dummy variables to estimate.
Explain and derive the FD estimator. What are the assumptions?
See notion
Assumptions:
- Linearity (? see the beginning of slides)
- Strict exogeneity is sufficient but not necessary! instead:
- $E[\Delta u_{it}|\Delta x_{i,t-1}, \Delta x_{i,t}, \Delta x_{i,t+1}]$
- Full rank - $k$
- $\text{rank} \ E[\Delta X_i’ \Delta X_i]=k$
Under these assumptions, $\hat \beta_{FD}$ is an unbiased estimator of $\beta$.
What are the differences between FE and FD?
When $T = 2$, FE = FD. This is not true when $T > 2$.
Differences
- Identifying assumptions are less strict for FD
- We mechanically loose more observations with FD than FE.
How should we think about standard errors if we use FE?
We need to cluster SE on panel unit (e.g firm level)!
What can we do if we have violations agains strict exogeneity with panel data?
If we have a violation of strict exogeneity in the form of an Ashenfelter’s dip, one solution can be to include a lagged dependent variable as a control
$$
y_{it} = \lambda_t + \rho y_{it-h}+\beta D_{it}+u_{it}
$$
Hence, we compare workers with similar earning histories.
The LDV estimator relies on qualitatively different identifying assumptions:
$$
E[u_{it}|y_{it-h}, \lambda_t, D_{it}]=0
$$
We can however NOT use FE and LDV together. Using a lag-dependent variable in a FE model or a unit fixed effect an LDV model will mechanically create a violation of the strict exogeneity assumption.
What are the panel data validity checks?
Individual-specific time trend
A general threat to identification is if the unobserved effects evolve over time. A useful check is thus to estimate the random trend model
$$
y_{it} = a_i +g_it+x_{it}’\beta+v_{it}
$$
where $g_it$ denotes the individual-specific time trend. Including this trend should not change our estimate of $\beta$.
Lead of treatment
The effect on a one-year lead $w_{it+1}$ of $x_{it}$ should not be significant. If it does, then strict exogeneity is violated since $E[u_{it}|w_{it+1}]\neq0$.
Qualitativly, how should we think about measurement errors in FE/FD-models?
With measurement errors we will get a version of the classical attinuation bias. Here the bias depends on the persistence of the measurement error and a other term for the indipendent variable.
If we do have a very persistent measurement error, that is, individual reports with the same error (underestimate or overestimate) then the correlation will be close to one and thus the noise component cancels and we get consistent estimates. However, if the persistence in the independent variable is large, then we will ha an increased measurement error).
The difference here from cross-sectional data is that we have two counteracting effects.
In microdata people often argue that measurement errors are persistent.