Lecture 11 (DiD basics) Flashcards
Use potential outcome framework to show the DiD estimator. Extend it and show what we really are estimating.
See notion (also exam 2022)
Rewrite this basic DD model in terms of expectations
Y_{it} = a_i + \lambda_t+D_{it}\beta_i+u_{it}
Y_{it} = E[Y_{it}(0)|i,t]+[Y_{it}(1)-Y_{it}(0)]D_{it}+(Y_{it}(0)-E[Y_{it}(0)|i,t])
State the identifying assumptions for DiD
- Parallel trends
- $E[u_{it}|D_{it},i,t]=E[u_{it}|i,t]$- No anticipation effect ???
- ??? no expression??
- No anticipation effect ???
What “parallel trends” in potential outcome framework?
E[Y_{i1}(0)-Y_{i,0}(0)|D_{i,1}=1]- E[Y_{i1}(0)-Y_{i,0}(0)|D_{i,1}=0]
Show the estimator for the TWFE-DD in the potential outcome framework.
Beta_ATT =…….
\hat \beta_{ATT}=\frac 1 N \sum_{i=1}^N\Big[D_{i,t=1}(Y_{i,t=1}-Y_{i,t=0})-(1-D_{i,t=1})(Y_{i,t=1}-Y_{i,t=0}) \Big]
State the TWFE model with multiple time period and explain its benefits!
y_{it}=\alpha_i+\lambda_t+\sum_{t=1, t\neq t_0}^T\beta_t D_{it}+e_{it}
The benefits of this dynamic approach are:
- Multiple periods of pre-treatment, $t<t_0$ for $\beta_t$ can partly help us validate the parallel trends assumption
- Multiple periods of post-treatment, $t>t_0$ for $\beta_t$ will give us information on the dynamics of the treatment effect.
Explain how we can conduct pretesting for PT and it’s caveats.
With multiple periods pre-treatment, we can test parallel trends pre-treatment by interacting the treatment with our leads.
Important regarding pre-trend testing:
- Testing for pre-trends is pre-testing! Hence, might introduce statistical problems if we select on pre-trends…
- Pre-trends in what? Logs or levels? We better have a prior decision on the functional form
Group-specific time trend
Another test that can be done is to include a group-specific time trend $\gamma_{gt} = \gamma_g\times t$. “Group” is here a level above the individual. Doing this and showing that DD-estimates doesn’t
State the TWFE-DD multiple treatment cohorts and it’s benefits and caveats.
In some settings, different units are treated at different points in time. That is, we have a treatment roll-out or a “staggered DiD”. For this, we formulate the model
$$
y_{it}=\alpha_i+\lambda_t+\sum_{\tau=-q, \tau\neq 0}^m\beta_t D_{it}+e_{it}
$$
where $\tau$ is the relative event timing and we include $q$ leads and $m$ lags. In this setting, we still need a never treated group (according to the slides).
Benefits:
- If we see effects across different timings we have a strong case for internal and external validity.
- Effects across different timings suggest that the effect is not driven by confounding macro shocks.
Caveats
- Weighting and heterogeneity issues (see last lecture)
Explain the concept of DD placebo tests
Even though we have parallel trends in the pre-period, it is not a gaurentee that it is the case post treatment.
One way to check for this is to study the effects on groups for which we do not expect to find an effect. E.g., if we have a policy targeted toward low-wage workers, high-wage workers should not be affected by the policy. Therefore, we study if there is an effect of the policy on high-skilled workers, if this is the case, we do ha a bias (what kind?). That is, we like $\hat \beta_{DiD}^{placebo} =0$.
This placebo test is implicitly incorporated into the triple DiD (DDD). In the DDD we have: $\hat\beta_{DDD}=\hat \beta_{DiD}-\hat \beta_{DiD}^{placebo}$.
What is the main idea of the DDD?
The main idea in the DDD is to include an additional control group for which we do not expect to see an effect. This will incorporate a placebo test into the DiD estimate and what we have is
$$
\hat\beta_{DDD}=\hat \beta_{DiD}-\hat \beta_{DiD}^{placebo}
$$
Write the tripple Diff estimator using the potential outcome framwork. That is use
\hat\beta_{DDD}=\hat \beta_{DiD}-\hat \beta_{DiD}^{placebo} and translate it.
$$
E[Y_{i,t=1}-Y_{i,t=0}|D_{i,t=1}=1, G_i=1]-E[Y_{i,t=1}-Y_{i,t=0}|D_{i,t=1}=0, G_i=1]-E[Y_{i,t=1}-Y_{i,t=0}|D_{i,t=1}=1, G_i=0]
$$
Where $G=1$ for the main treatment and control group and $G=0$ for the placebo treatment group. If we omit the last term, we have exactly the original DD estimator, if we omit the middle term, we have a different DD estimator, with the placebo as the control.
Specify the DDD in OLS
See notion
What are the identifucation assumption for DDD?
- We do not need the parallel trends assumption to hold for the original treatment group. Instead, we assume that the parallel trends bias is the same for all groups.
Peter Nilsson puts it like this: One key feature of the DDD estimator is that it allows for violations among the parallel trends assumption.
- e.g., different trends across regions are absorbed by including region-by-year fixed effects in the regression
- this would not be possible in the DD model since it lacks the additional comparison group.