Week 2 - Causality and Difference in Differences Flashcards
What is causality?
X causes Y if..
• We intervene and change X and nothing else,
• Then Y changes as a result.
Why is causality important? /When do we use it?
• Many questions we want answers to are causal.
• When we talk about marketing, we often want to know why something happens.
o Did demand/revenue/.. change because of?
o And by how much?
• We also care about non-causal questions (prediction, descriptive relationship/patterns between data).
o But our comparative advantage should be causality.
Why is correlation not causation?
- The opposite is true, B causes A.
- The two are correlated but there is more to it. A and B are correlated by they’re actually cause by C.
- There’s another variable involved. A does cause B but as long as D happens.
- There is a Chain reaction. A causes E, which leads E to cause B.
- It’s due to chance. Y0ou find patterns or processes with 2 variables being related which shouldn’t be. Statistical change.
How can you tell when a correlation is causation?
Its hard but possible, we need assumptions to estimate an average causal effect:
• “What would have been” – (approximate) counterfactual outcomes.
• “As good as random” – no selection on unobservable
o Known as “conditional independence”.
o No unobserved factors driving variation in variable of interest.
Are regression assumptions causal?
Regression assumptions on their own aren’t causal interpretations of B.
• Regression assumptions: Unbiasedness, Variance of estimates.
• Causal inference assumptions: Can an unbiased estimate be interpreted causally.
o Valid counterfactual outcomes.
o Conditional independence.
Why can you use experiments for causality?
Experiments use clear counterfactual outcomes, reasonable to assume conditional independence.
What two types of experiments are there?
- Randomised control trial (RCT). Also called A/B tests: The researcher randomly assigns observational units to treatment group, control group.
- Natural Experiments / quasi-experiments: “Nature” divides a population into treatment and control in a way that is as good as random.
- Both approaches compare changes over time between groups.
When would you use DiD?
We can use DiD when we want to answer the following question:
What is the effect of some marketing intervention on those who were effected by it?
What are the advantages of using the regression approach? (DiD)
Get standard error of estimate.
• Assess whether effect is statistically significant.
• Should cluster standard errors.
Can add extra control variables into the regression.
• Either as usual controls and/or as fixed effects.
• Particularly useful for natural / quasi-experiments.
Can use log(y) as the dependent variable.
• Delta is the percentage change in Y due to treatment.
What are parallel trends?
We must assume that time effects treatment and control groups equally.
Its untestable, however we can check whether patterns in the data are suggestive its OK:
• Check whether prior trends are the same for treated and control groups.
• Compute average of outcome by group over time.
• Was the gap changing a lot during that period? If not, suggestive we’re OK.
What are threats to internal validity?
Statistical inferences made about causal effects are valid for the considered population. Threats: • Failure to randomize. • Failure to follow treatment protocol. • Attrition. • Experimenter demand effects. • Small sample sizes.