F10 Difference-in-Difference Flashcards
What are the elements of the most basic DiD design?
Two groups in two time periods - pretreatment and post treatment period (2x2)
What is the mathematical specification of a simple 2x2 design?
delta = (yt-bar^post - yt-bar^pre) - (yc-bar^post - yc-bar^pre)
In other words a difference between two differences. The temporal difference in the control group and the treatment group.
What is the central assumption?
Parallel trends assumption: Up until the intervention groups have the trajectory on an outcome of interest.
What does the parallel trends assumption mean?
The control group is used as a counterfactual development for the treatment group, as we assume they would have had parallel trends in absence of treatment.
What type of effekt are we estimating?
ATT: Average treatment effect on the treated.
What is the regression for a simple 2x2 DiD? And what do the different parts express?
Y = α + λT + γD + δTD + ε
α: Control in pretreatment
α + λ: Control in post treatment
α + γ: Treatment in pretreatment
α + λ + γ + δ: Treatment in post treatment
What is a triple DiD?
Three dimensions. Not only between time and unit but also within a unit (e.g. subpopulations, locations, or policies)
8 groups are now relevant e.g. control group pretreatment high income.
How does a regression for a triple DiD look?
Y = α + T + D + Z + TD + TZ + DZ + δTDZ + ε
So all categories for themselves, as interactionterms and then as three-way interaction.
Delta is still the effect estimate (when all dummies are =1)
What kind of estimate is the triple DiD?
ATT: Average treatment effect on the treated for a subgroup of units (we can’t rule out heterogeneous treatment effects).
What is a staggered DiD?
A design where different units receive treatment in different points in time (multiple time periods).
E.g. US states implementing medicaid at different points in time.
What is special about the control group in a staggered DiD?
The control group consists of units that have not yet received the treatment (future-treated units) and possibly those that never receive the treatment. A main challenge
When is a staggered DiD especially relevant?
When you want to study dynamic trends over time e.g. effects of a policy (effect that vary over time).
What is important for coefficients for time periods leading up to the intervention?
They must de insignificant otherwise the effect is triggered by something else than the intervention.
What is serial correlation and why is it a problem?
Also known as autocorrelation. Serial correlation (or autocorrelation) occurs when the residuals (errors) in a regression model are correlated across time.
Often occurs in times series data. With more observations for the same unit, I kind of artificially inflate the number of observations and thereby the power (effective sample size is smaller).
This probably result in much lower p-values than what can be justified (type I errors or false positives)