Lecture 9 Flashcards
What are repeated cross sections?
For data collected at multiple time points, but not tracking the same individual
What is panel data?
- what are the benefits?
Where the same individuals are tracked across multiple time periods
- more observations so higher precision
- informative about dynamics, so reveals time-based changes
- controls for individual specific effects - reducing OVB - unique to PD
How is random sampling over time carried out?
- pooling/ combining these different samples
Each dataset represents a random sample from a population at a specific point in time, like census data
- pooling and combining these multiple cross-sectional data sets, you can increase the sample size, but also analyse changes over time.
How to pool the data to get a structural break regression?
- what are the benefits?
Combine the two data sets, so add in time dummy variables, which would tell us which data set the individual belongs to
- increase the sample size - improving statistical power
- comparison across time
- simplifies interpretation
What can pooled regressions sometimes allow us to estimate
Effects of events, like policy reforms
- can compare data of individual before and after reform
- by using data over time, you can control for state specific factors, reducing the likelihood of OVB
For OLS estimator of pooled treatment regression to be unbiased
- B1^ =
Expected value of the error term conditional on treatment status must be equal across groups
- violated in many applications
- B1^ = is the difference in average outcomes between the treated and control groups
Difference in difference approach
Outcomes over time and between group types
- for the DiD estimate to capture the causal effect, must assume that in the absence of treatment, T and C groups would have experienced parallel trends in their outcomes over time
=> (Yt,2 - Yc,2) - (Yt,1 - Yc,1), all values are the means respectively
If we run OLS on this regression:
Yit = B0 + K0d2t + B1dTi + k1d2t.dTi + uit
When d2t = 0, dTi = 0 -> B0^
When d2t = 1, dTi = 0 -> B0^ + K0^
When d2t = 0, dTi = 1 -> B0^ + B1^
When d2t = 1, dTi = 1 -> B0^ + B1^ + K0^ + K1^
- predicted values are just sample means for different groups, e.g. B0^ is the mean of the control group before reform
Whats the key parameter in the DiD estimator?
Yit = B0 + K0d2t + B1dTi + k1d2t.dTi + uit
K1^ - measures the effect of reform, controlling for inherent differences between groups
K1^ = (Yt,2 - Yc,2) - (Yt,1 - Yc,1)
Generalised version of DiD - add controls
- advantages
- DiD estimator, is the OLS estimator of k1 in the regression
- convienient way to obtain SEs
- straightforward to add time-varying controls
- including relevant controls makes it more likely that DiD works.
How to define trend in treated and control group:
Change in Ut = Ut,2 - Ut,1
Change in Uc = Uc,2 - Uc,1
If these two are not equal - then there are group specific trends in yt, and the parallel trends assumption is violated
Plim(k1^) = k1 + changeUt - changeUc
- common trends assumption
Common trends/ parallel trend assumption is that the trend in the error term for both the treated and control group is the same
- if this holds, then the OLS provides unbiased estimates of the treatment effect
What if PT assumption is violated, and there are group specific trends, e.h. High and low earners could be affected differently by economic, demographic trends.
- split the error term into a fixed effect, and then a term which does vary over time
- suppose E[uit | unemit] = 0, so the OVB is due to E[ai|unemit doesn’t = 0
First differencing to eliminate the fixed effects:
-> crimei1 = B0 + B1unempi1 + ai + ui1
-> crimei2 = B0 + k1 + B1unempi2 + ai + ui2
Change(crimei) = k1 + B1Change(unempi) + change(ui)
- OVB removed if E[change(uit)|change(unemit)]