3.DD Assignment Flashcards
Varför använder vi klustrade standard error?
Why do we cluster standard errors? We may worry about:
I Within-group correlation: At any point in time, observations are correlated within some group, e.g. within states or among disabled/non-disabled individuals within
states.
I Serial correlation: Observations are correlated across time. In our setting, this seems likely because employment status tends to be quite persistent over time.
=⇒ In both cases, observations are no longer independent!
I Clustered standard errors: Allow arbitrary within-group and serial correlation within clusters/group (e.g. states), but assume that observations are uncorrelated across
clusters/groups.
I Key trade-off: Consistency of the clustered standard error estimator achieved when the number of groups/clusters G becomes large (G → ∞).
Hur förhåller sig klustrade SER till robusta SER om vi har positiv korrelation mellan kluster?
Clustered larger than the robust standard error
Vad menas med Moulton problem?
: Incorrectly using robust standard errors leads to inflated test
statistics and over-rejecting null hypotheses!
Nämn några problem som kan påverka DD estimatet i seminarie artikeln.
Även om det är specifika exempel, gäller de ju generellt för DD.
Other contemporaneous changes:
There may have been other relevant reforms/changes that
take place at the same time as the introduction of the ADA and that affect disabled and nondisabled individuals differently such as changes in the minimum wage, firm payroll taxes, firm regulation, disability insurance benefits etc. The authors do not really discuss whether there are any such contemporaneous changes that may confound their estimates, so we might infer that this is probably not the case.
Pre-trends:
Do the employment and wages of disabled and non-disabled individuals evolve differently already before the ADA is implemented?
Anticipation effects:
Even though the ADA came into effect in 1992, it was signed into law already in 1990. We may therefore worry that firms could have reacted to the law in anticipation of its introduction, causing diverging trends in employment and wage before 1992. looking at the interaction terms for 1990–1991 in Table 2 of the paper, there do not seem to be clear anticipation effects.
Composition effects:
Since the CPS data are not panel data (the CPS uses rotating samples of households), we cannot track the same individuals over time. We may therefore worry that more people identify themselves as disabled after the ADA was implemented. If we look at Figure 1 and Table 1 of the paper, there seem to have been increases in disability rates (especially among women) after 1991. To address this concern, the authors use a matched sample of individuals from the 1993 and 1994 CPS and compare individuals who report either being or not being disabled in both years. They conclude that the results remain broadly similar
for these subgroups, so increases in reported disability rates cannot entirely account for the estimated employment losses.
Group-specific trends:
The authors note (pp. 935–939 of the paper) that during the time period of interest, the number of disabled individuals receiving Supplemental Security Income (SSI) or disability insurance (DI) benefits increased.
Vad är skillnaden med att kollapsa en regression på gruppnivå och väga den efter storleken på cellerna och att köra en vanlig på individnivå?
The point estimates are exactly the same. This is because the “treatment” (the ADA) only varies at the group level, not at the individual level. Therefore, we are not changing anything about the regressors when aggregating up the data. However, the standard errors are of course not the same, because the group-level standard errors do not use the within-group variation in residuals.
Note also that to obtain exactly identical point estimates, we need to weight group-level observations with the number of observation within the group.
Our treatment of interest only actually varies at the disability status × year level
(with no variation in treatment status within states), so we could also have instead aggregated
the data at this level, estimated Equation (2) (again, weighting observations by group size), and obtained the same point estimates.
Vid en dynamisk DD, spelar det någon roll vilken interaktionsterm vi exkluderar från regressionen?
The choice of which interaction term to exclude from Equation (3) does affect the interaction terms.
Intuitively, this is because the interaction terms measure the differences in weeks worked relative to the year of reference. The reference year, in turn, is determined by the year for which you exclude the interaction term.
In sum, different choices of the excluded interaction term lead to different normalizations of the remaining interaction terms.
Har man perfekta parallela terender innan behandlingen kommer vi ha osignifikanta effekter vid alla interaktioner pre treatment, det spelar då ingen roll vilket år vi exkluderar eftersom vi i princip jämför allt med noll.
Vi kan då också välja att exkludera alla åren innan behandling för att jämföra med det medlet. Men det ska inte vara någon jättestor skillnad.
Vilket test ska ska man använda för att kolla parallella trender vid en DD?
The idea is to use an F-test (or, alternatively a chi-squared test) for the following joint null hypothesis: H0 : θ_1988 = · · · = θ_1991 = 0.
When running a join F test we test the possibility that one of the coefficient is significantly different from zero. We get an F value of 4.75 and a P value of 0.0008, meaning that at least one is most likely different from zero,
Vi rejeltar nollhypotesen att alla = 0 på 0.001 nivån (Miika gör det på 10% nivån??)
Since our treatment of interest only varies at the disability × year level, we should be worried that individual-level observations (or rather, error terms) are correlated in at least two ways:
Hur löser vi det?
(a) Within-group correlation: At any point in time, observations are correlated within some group, e.g. within states or among disabled/non-disabled individuals within states.
(b) Serial correlation: Observations are correlated across time. In our setting, this seems likely because employment status (and hence our outcome, i.e. weeks worked) tends to be quite persistent over time.
Both within-group and serial correlation violate the assumption underlying the usual (heteroskedasticity) robust standard error estimator, namely that the observations are independent.
To allow for these two types of correlations, we can use clustered standard errors. For example, when we use
standard errors clustered at the state level, we allow arbitrary within-group and serial correlation within states, but assume that observations are uncorrelated across states.
VIlken trade off finns vid clustered standard errors?
There is an important trade-off you need to keep in mind when using clustered standard errors.
While the consistency of the conventional “robust” standard error estimator is achieved when the
number of observations becomes large (N → ∞), the consistency of the clustered standard error estimator is achieved when the number of groups/clusters, G, becomes large (G → ∞).
Hence, clustering at a higher level allows taking within-group and serial correlation into account more flexibly, but leads you to have fewer clusters and possibly finite sample bias (since the asymptotic approximation may not be valid)
Vad kan hända om det finns negative within cluster correlations?
När det är så kan klustrade SE faktiskt vara mindre än robusta SE.
What is then the ”right” level to use in the diff in diff? If we have a manipulation on state-level, running the regression on an individual level gives us more observations and thus does decrease our SER, but should we consider this an artificial boost of consistency that increases the chance of type 1 errors? Or is more observations always better?
FRÅGA TILL MIIKA: SVAR = HANS SVAR
if you want to run a regression where the regressor of interest is at the group level, then you still want to generally run it using individual-level data.
Some reasons for this are:
(i) If you are confident that there is no need to use clustered standard errors, then using e.g. robust standard errors leads to more precise estimates since you are using within-group variation in residuals, which you wouldn’t be able to use if you instead used grouped data, and
(ii) if you have individual-level control variables, then adding them to the regression improves the precision of the estimate on the regressor of interest.
There is some discussion on grouped-data regressions in Section 3.4 of Mostly Harmless Econometrics. There the main advantage of this approach they mention is that if you DON’T have access to individual-level data, then you can still use group-level data to back out the point estimates that you would get from individual-level data as long as you know the group weights.
Vilken pre treatment interaktion ska man omitta?
FRÅGA TILL MIIKA: SVAR = HANS SVAR
If all the pre-treament effects in the dynamic models are estimated to be close to zero, then the coefficients for the post-treatment effects are very similar regardless of which pre-treatment effect you omit from the model. In that case, pooling the pre-treatment effects can then lead to more precise estimates for the post-treatment effects (since you have less parameters to estimate in the model).
Regarding individual fixed effects. If running a DiD and we see that we have perfect parallel trends in the pre-treatment period, what is then the benefit of adding fixed effects since we could more or less argue that our treatment and control group seems to be completely equivalent? If adding fixed effects, is there any reason to add other controls for things we do observe, or have we more or less controlled for everything with the fixed effects already?
FRÅGA TILL MIIKA: SVAR = HANS SVAR
In general, you need the group fixed effects to take out any time-invariant differences in the LEVELS of the outcome between groups, e.g. in Assignment 1 we use the disability dummy to account for the fact that disabled workers work a lot less than non-disabled workers. And in order to add time fixed effects into DD model, you NEED the control group. This is because under parallel trends, we can use the time trends observed for the control group to capture the time trends for the treated group in the counterfactual case where treatment was not introduced.
And if you have other control variables, then adding them to the regression can increase the precision of your coefficients of interest, but they shouldn’t affect the estimated coefficients themselves.