Empirical Tools Flashcards
Correlation
when two variables move together
- useful for passive prediction
Causation
when one variable causes (or affects) another variable
Identification problem
the question of whether one variable causes another
Treatment variable
Di : the variable that generates the causal effect we’re interested in
Outcome Variable
Yi : the var. that is affected by the treatment variable
the concept of potential outcomes
The outcomes that a person would have under different values of the treatment
the individual treatment effect
The difference in i’s potential outcomes in the case in which i does v. doesn’t receive the treatment
the fundamental problem of casual interference
we can only ever observe one potential outcome per person. We can see i’s potential outcome in the observed state
- It is equal to the person’s realized outcome, Yi
Average treatment effect
the average of the individual treatment effects in a population
selection bias
when the selection of subjects into a study (or their likelihood of remaining in the study) leads to a result that is systematically different to the target population
- the difference in what average outcomes would be absent the treatment
- prevents us from identifying the average treatment effect
observed state
state we do observe
the counterfactual state
the state we do not observe
treatment group
those with Di = 1
control group
those with Di = 0
naive method
Compare outcomes for people who in the real world happened to get a treatment v. those for people who didn’t
- fails due to selection bias
randomized trial
the scientific term for an experiment
- In a rand. trial, the treatment, Di, is assigned via random chance
Independence
the difference in avg. outcomes is the ATE
Indirect Random Assignment
Randomly assign a variable that is related to, but not exactly the same as our variable of interest
External Validity
results are valid only for the experiment’s participants
– The same experiment in U.S. and Sweden may generate different results
– But this suggests running lots of experiments and comparing them
Attrition
When participants leave an experiment before it is complete
– If attrition is non-random, it can generate bias
-> Compare all of the treatment group with only some of the control
– Deal with attrition by finding a dataset with universal coverage
– apply for access to a database, instead of just conducting a survey
Observational Data
Data that does not come from a deliberately designed experiment
Time series
data on a particular variable over time
Cross-sectional data
data on many people at one point in time
Time-Series Analysis
the study of how series co-vary over time
Cross-sectional regression
– the study of how variables co-vary across people
– A statistical method for studying how variables co-vary across people
– Both a model and an estimation strategy
– both a representation of how variables are related and a means of fitting that representation to the data
With observational data, there is an ever-present threat of
bias
Difference-in-Difference model
A strategy that applies when:
1. Treatment & control groups form an imperfect comparison
2. We have data on these groups over time
Quasi-experiments
A naturally occurring situation that resembles an experiment
– A situation where a change in the economic/political environment creates
nearly identical treatment and control groups
– These T & C groups can then be used to measure a causal effect
– Often called “natural experiments” and often arise due to policy changes
Imperfect Comparison
the difference in average outcomes contains bias
Post-Period Difference
a non-experimental comparison of avg. outcomes
– the naive approach for recovering a treatment effect
– From before, δpst = ATE + bias(pst)
Pre-period difference
just a bias term!
I δpre = bias(pre)
– Because no one has received the treatment yet
Parallel trends assumption
assume that the bias is stable over time
– bias(pre) = bias(pst)
- The difference in average outcomes between T&C in the pre-period would have carried over to the post-period if the policy change had not occurred
T&C is stable during the pre-period
– If it is, then can assume would continue to be stable into the post-period
– If it isn’t, then know DID is invalid. Look for a new strategy
Regression Discontinuity
a situation where a person gets the treatment if above a cutoff but not if below
Insight of the Regression Discontinuity design
people right around a cutoff are similar
– If people have imperfect control over their scores, then scoring just above / below the cutoff is nearly random
⇒ Treatment assignment is approx. random for people close to the cutoff
⇒ Can recover an ATE for these people, just as in a randomized trial
- However, we can’t see i’s potential outcome in the _____
counterfactual state & thus, we cant calculate the average treatment affect