DiD Flashcards

1
Q

When is it possible to estimate a fully saturated model?

A

It is always possible when the treatment is binary and we have never-treated comparison group. (Is this just for staggered?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we mean with a fully saturated model?

A

A model where we include all leads and lags we have available minus one for the normalization period, usually t-1 .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we mean by binning?

A

Setting a point from where we assume the effect to be constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the effect we get in a distributed lag model?

A

The dynamic multiplier effect. The change in effect from one period to the next.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do we have to do to go from a distributed lag model (DLM) to a Long run cumulative effect model?

A

Take the first difference of out X variable, sum our beta coefficients. The last lag we need to be in levels still (this is binnig). This is thus a way of specifying the DLM to directly estimate the cumulative effect. (This is from S&W)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What effect is \beta_1 in a DLM?

A

The impact effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we go from an event study to a distributed lag model?

A

We take the first difference of our Event study model. This model is then one of “changes on changes” Se artikeln.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For variation in what dimension do we need to add controls?

A

If the confounder both are time and unit varing. Otherwise we could use unit or time fixed effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the seven suggestion by @Freyaldenhoven et al (2021) to an good event study graph

A
  1. The -6+ and 6+ shows that it is a binned event design
  2. Normalized at t-1 (given no anticipation effect
  3. Reference level when the effect is zero
  4. Sup-t bands are based on standard errors for the whole path and are thus larger and more conservative. This regards multiple testing (to see if the whole pre period yields parallel trends), and we need a larger confidence-bandwidth.
  5. Pre trends p-value. Tests whether we have parallel trends pre treatment (if the coefficients on the leads are zero). It should be sufficiently high.
  6. Leveling of test (if there is dynamic effects that levels of, if the last coefficient is the same as the coefficient one period before). This is the leveling off p-value that we also like t be big.
  7. They plot the last wiggly path. This shows that the effect is not driven by some trend. It would be bad if there was a straight line. The least wiggly line shows how the confounder must look if it would drive the effect. In this example, it is highly likely that the confound looks like this.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How should the effect look like in an event study to be persuasive?

A

It should come more or less immediately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is it not possible to use eventdd?

A

When we have a continuous treatment since the command is based on binary assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the command presented in @Freyaldenhoven et al (2021) and when should we use it?

A

xtevent and xteventplot. These are thus for restricted models. If we have never treated units we could do better since we could relax our assumptions and run a fully dynamic model given that the treatment is binary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the @de Chaisemartin and D’Haultfoeuille (2020a) STATA command and what assumptions are being made?

A

This estimator allows for heterogeneous treatment effects and are valid for any type of dynamics, that is, it do not restrict the dynamics at all. The estimator is therefore Pers prefered estimator. Since no assumptions being made about the dynamics, we can choose the number of leads and lags to include (just setting the event window). Therefore the estimates do not change depending on the number of leads and lags included. This is not the case in the restricted model using xtevent where the estimates would change.
Linear time trends are problematic to include in this new estimators!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the drawbacks with using did_multiplegt and csdid?

A

The drawbacks with these new estimators is that they throw away observations so there is less data in the analysis and therefore less precise estimates. Therefore it is better to use TWFE if we can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of using a binscatter?

A

It is a way of assessing whether the functional form in our regression is valid. Is it really linear etc. We do not need to do this if we have an binary variable, but if we have continuous variable we need to assure that linearity holds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can we do (accept to argue for parallel trend) when parallel pre trends are shown in our data?

A

We can normalize the whole pre-period to get more precision in out estimate!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What should we do instead of using the Frisch-Waugh decomposition (binscatter command) when constructing out binscatter?

A

This one is only valid of the function is perfectly linier. We should instead use the command Binesreg or Binscatter2 to produce the correct binscatter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When can and can’t we run a model where the treatment yields heterogeneous effects in the population?

A

Binary treatment → heterogeneous treatment effects = OK! with did_multiplegt, csdid.

Continuous treatment → heterogeneous treatment effects = NO way to fix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is SUTVA and what happens when this is violated?

A

Stable unit treatment value assumption. This says that there is no spillover between the treatment and comparison group. If there is, the comparison group are contaminated and the true treatment effect might be different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Discuss the pros and cons with using a “unit specific linear time trend” in our model?

A

If we have a nice DD design we do not need to include unit specific trends.

But those can be good to include if they are on diverging trends in the pre-treatment period.

If the effect comes immediately and jumps up, there is no problem of including these types of effects.

If the effect however comes with a lag, then this variable will soak up some of the variation. We are then killing of some of the effect.

We can use it but we should be aware of the problem and be transparent about that. The credibility of the results are lower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we test if we have heterogeneity?

A

To test if we have heterogeneous treatment effects we compare the estimates from a TWFE (e.g xtreg) and the new estimator (e.g did_multiplegt). If the results are similar, then the treatment effect are homogeneous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do WLS change the estimates?

A

If the treatment is homogeneous, it should not matter if we use weighted least square(WLS) or OLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What about having two different treatments in our DiD model?

A

Having two different treatments in a model is not considered a “standard DiD” and will introduce a endogeneity bias. However, Chaisemartin and D’Haultfoeuille shows in their paper “Two-way Fixed Effects Regressions with Several Treatments” that it is possible to solve this. We can use did_multplgt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What can be problematic when we are studying the effects of changes in laws?

A

It can be problematic since there can be many types of behaviors that can be affected by the law. It is therefore hard to estimate a ceteris paribus effects since we can’t isolate or know the exact mechanism that are driving the result. It is then rather an reduced form estimate that we get.

25
Q

What about restricting our sample to only one type of people?

A

Per think we always should use the whole sample then discuss whether we should focus and zoom in on specific subsamples. Restricting the sample is the same thing as using a control according to Per.

If we were to look at only one subgroup (e.g women), we could either include only those in the study or use a interaction to study the specific effect on this subgroup.

26
Q

What are the problems with studying answers on survey questions?

A

People do not answer truthfully to surveys. Especially if they are commenting on a law or something. In one example in our course regarding legalization of cannabis, people might answer more true to if they have consumed cannabis if it becomes legal, therefore it we will both have measurement error and a positive bias in our estimate.

27
Q

Why is it problematic to use small samples?

A

If there are few observations in the treatment group it will be hard to detect a treatment effect. If an effect is detected, it will probably be to large just by change. We thus probably get an upward bias.

Since there are hard to find an effect in small sample, only really large effects will be significant, but this will most likely be chance.

28
Q

How should we address binnig?

A

If we have a event study that have somee dynamic restrictions, we have to discuss that. How we do bin etc. We should in that case specify in the model how we are binning, that is, show that we have a dummy = 1 for all leads or lags after certain values.

29
Q

What about having many outcome variables in our design?

A

Using many outcome variables is always problematic since we will likely mechanically find significant results. Having 100 outcome variables will make us find 5 that are significant. Multiple comparison problem. This is related to p-hacking.

30
Q

How many leads do we need to make a good estimation of parallel pre-trends?

A

Not clear but Per thinks that the choice should be symmetric as least. If we estimate a dynamic effect for 10 years, we should show the leads for 10 years as well.

31
Q

A distributed lag model is an ……..

A

binned event study.

32
Q

What are the identifying assumptions in DiD and synthetic control approach?

A

In a DD we need parallel trends. With synthetic control group we need conditional mean independence (conditioned on the lagged outcome the error term is unrelated to the treatment). We should not mix those designs!

33
Q

What problem are we facing when we only have one treatment group?

A

With only one treatment group and many control groups the standard way to construct the SE will not be correct. We can’t use OLS. Some authors instead use permutation tests. However, other authors have suggested a better method. The take home message is that we can’t use OLS to construct our SE when we only have one treatment group.

34
Q

What effect are a an event study estimating?

A

The cumulative effect of treatment. If we see an effect that goes down, it is not an event study. Don’t think this is true… the last part.

35
Q

What are we assuming in a dose-response DiD and what do we need to show?

A

If we have a continuous treatment, this effect should be additive and linear! Therefore we need to show a binscatter to show if the the effect is linear in the data.

36
Q

How should we think about clustering?

A

We should always cluster at the level of intervention. However, it might be the case that we have to double cluster if there is happening something at a higher level. For example state level and time level. Time level takes in to account there is happening some at the federal level for everyone.

37
Q

What effects do weighting have on the bias in our estimation?

A

Weights have nothing to do with bias. But if weight matter, then it is an misspecified model.

38
Q

How should we specify our model if we are using the new estimators by @de Chaisemartin and D’Haultfoeuille (2020b) or @Callaway and Sant’Anna (2020)?

A

We should specify a correct model where we allow beta to vary with time and over individuals. Putting the right subscript.

39
Q

What is Per’s thought on estimating just a static model?

A

Including only one lag or so in the model is bad. Why not construct a real dynamic model with more lags and let the data speak. We should specify a correct model and look at the data!

40
Q

What do we need to do in order to make sub-population analysis?

A

We should clearly state why our betas might vary for different subgroups etc. Otherwise we might be data-mining.

41
Q

What are the effect we are estimating in a DiD?

A

ATT = Average treatment effect on the treated.

42
Q

What is the purpose of binning?

A

Binning identifies dynamic treatment effects in the absence of never- treated units and is particularly suitable in case of multiple events.
Binning imposes implicit assumptions that enable to identify dynamic policy effects in the absence of never-treated units. Without binning it would be impossible to separate the dynamic policy effect from secular time effects.

43
Q

How should we determine the length of our effect window?

A

This should come from economic theory if possible and our intuition about how we think the effect might look. She choice of the effect window is restricted by data availability. Increasing leads or lags reduces the estimation sample if treatment is not observed for these additional periods. Researchers should experiment with different effect windows length (bearing in mind that the estimation sample might change. See 2.2.2 in @Schmidheiny and Siegloch (2020) for an extended discussion.

44
Q

What feature will cause a bad control group?

A

In a study on the effects of giving birth to a child, men will constitute a natural never-treated group. But if a unit cannot possibly be treated for its special characteristics, the very same special characteristics are potential confounders.

45
Q

What should we think about when estimating increases and decreases of e.g a tax change?

A

We need symmetry between a increese and a descreese. It can therefore be better to estimate increases and decreases in a policy separately.

46
Q

In what setting can did_multiplegt be uses?

A

The estimators can be used if the treatment is binary and the design is staggered, but they can also be used if the treatment is not binary and/or the design is not staggered. @de Chaisemartin and D’Haultfoeuille (2020b). It is also robust to heterogeneous treatment effects.

47
Q

What can we do if we have no never treateds?

A

We can take away the last period and estimate the unbiased model if we assume homogeneity.

48
Q

Mention as many things as possible an DiD article should include. That is, a DiD checklist.

A
  • [ ] The authors try to explain the mechanisms that drives their result
  • [ ] Are the paper addressing/discussing potential heterogeneity? Per says any paper that do not adress till issue with heterogeneity is not reliable. They can always test if this is the case by comparing estimates from TWFE and the new estimators.
  • [ ] Only pretreatment controls?
  • [ ] Showing estimates with and without control variables?
  • [ ] Are hypotheses created before or after the authors look at the data? Hypothesis should always rely on teori.
  • [ ] Many many different outcome variables? This raises a red flag. Relates to p-hacking.
  • [ ] Showing that weighting do not affect the estimates? Should always show both.
  • [ ] Authors build their story on many credible papers?
  • [ ] If binnig, are they addressing this issue in discussion?
  • [ ] Is binning done correctly?
  • [ ] Are they using both a lag dependent variable $(y_{t-1})$ and fixed effects?
  • [ ] Are clustering done at the right level?
  • [ ] Not mixing parallel trend assumption and conditional mean independence (conditioning on lag variable)
  • [ ] The DiD is specified in levels, not changes (then it is a distributed lag)?
  • [ ] If they do not find an effect, there is no need to explore the mechanism.
  • [ ] Including only one lag or so in the model is bad. Why not construct a real dynamic model with more lags and let the data speak. We should specify a correct model and look at the data!
  • [ ] Figures are presented without controls?
  • [ ] Makes a argue for how the time-window might change their results?
49
Q

Explain the ratio-problem/problems.

A

Using a ratio at the LHS or/and RHS might produce spurious correlation as shown by Kronmal (1993).

One might be tempted to keep a variable constant by using it as a denominator to the outcome variable of interest. However, the correct way to control for stuff (e.g population size) is to add it on the RHS. Dividing with a term on the LHS is actually an interaction.

Bartlett and Partnoy (2020) shows that we need to be extra cousus when dealing with measurement errors in the dependent variable when it is a ratio. Error in the numerator can be addressed by standard techniques. Error in the denominator can not.

50
Q

How can we solve the ratio problem?

A

Say that we like to regress income on migration controlling for population size.
The correct way is is just to add population size at the RHS. However, if we like to have a ratio interpretation, we should have the ratio at the LHS while controlling for 1/population at the RHS.

Alternatively we could tale log(income/population) then add log(population) to the RHS.

We should always think if the control that ends up on the LHS is changing over time. It could be a bad control! Then we are stuck between a ratio problem and a bad control.
For example population size do not change so fast, so in sum settings, it might not be so problematic to include it as a control.

51
Q

What is the “bias-precision trade-off” in the DiD framwork?

A

When setting up a model, it is always best to put as few restriction as possible. This way we will have less bias. We can then start to exclude leads and lags to gain more precision. This will however introduce more bias. So there is always a trade of between a correct model and good precision. However, using a to big time window will likely create a violation of SUTVA since there eventually will be spillovers, thus the treatment and control group will not be similar.

52
Q

Describe what the problem is with continuous treatment in a DID setting.

A

If we have a binary thing than we know who is in the control and treatment group, while in the continuous case, everybody is in the treatment group, som less than others. It is then harder to understand what the covariance is.

53
Q

Should we always use binscatter in a DiD setting?

A

No, only when we have a continuous independent variable.

We need to use Binsreg or binscatter2 when we also include other covariates or fixed effects.

54
Q

Describe the problem and take home message of lag-dependent variables in DiD designes?

A

If we use a DiD, don’t include a lag dependent variable. If we use the lag independent variable, don’t include fixed effects. The assumption for lag dep is conditional mean independence.

An lag dependent variable ($y_{t-1}$) can be used as an control variable. That is, if the initial level of outcomes between the groups are very different, then we can control for the initial level by including $y_{t-1}$to make than comparable in the initial level. Than, conditioning on $y_{t-1}$the treatment is as randomly assigned. However, we can never include a lag dependent variable as a control when we have fixed effects. It is only possible if we have a large number of observations. Using $y_{t-1}$ as a control is thus a other thing than assuming parallel trends, and we can’t combine both a lag dependent variable and parallel trend. If we have strict exogeneity we can’t include lag dependents.

This is further discussed in Mostly Harmless Econometrics CHP 5.3.

55
Q

What do we need to think about if we only have one treatment and control group regarding clustering in a DID setting?

A

We need many observations before and after treatment to solve the clustering problem.

56
Q

Do we need to show that we have a first stage etc in fuzzy RDD

A

Yes, accoding to Per it is good.

“Strikt behövs bara smothness of CEF för en (sharp) RD. När det gäller fuzzy Rd så kan det också bero på ifall man har “a one-sided compliance problem” (kontroll gruppen får inte ta treatment) eller “two sided compliance problem” (kontrollgruppen kan också ta treatment). Se mitt RD uppastas från 2014 med Björn Tyrefors för denna diskussion.

Angrist argumenterar för att fuzzy RD design är =IV men det behöver inte vara så, dvs det kan vara viktigt att argumentera att det finns ett starkt första steg in en fuzzy RD design. Hur som helst, man kan alltid köra på reduced form utan att skala effekten i en fuzzy RD design, dvs man estimerar då en intention to treat effekt istället för en treatment on treated.”

57
Q

How many clusters do we need in unit or time to cluster?

58
Q
  1. What is happening to our SE if we do not cluster when we need to?
  2. What happens to our SE if we cluster when we need to but we have to few clusters?
A
  1. We will have to small SE’s (good precision).

2. We will get artificially to big SE’s (bad precision)!

59
Q

How should we think about heterogeneity in a DiD?

A

It should always be discussed! Remember that it always can be tested with the did_multplegt.