F6 Directed acyclic graphs and potential outcomes causal model Flashcards
What is a DAG?
Graphical representation of the theorized data generating proces. It models chain of causal effects
Why use a DAG?
Simplifies theoretical arguments
Model the argument
Communication to the reader
What does nodes, arrows, filled and dotted lines mean?
Node: A random variable
Arrow: A causal relationship
Filled line: Observed
Dotted line: Unobserved
What are the principles of a DAG?
Causality runs in one direction forward in time (no cycles and no endogeneity)
Reverse causality and simultaneity are not possible
Causality is understood in terms of counterfactuals
What is a confounder? (draw it)
Affect both D and Y = an open backdoor path that needs to be controlled for.
D <– X –> Y
What is a collider? (draw it)
D and Y affect X = closed backdoor path. Controlling will result in bias.
D –> X <– Y
What is Jan’s strategy regarding colliders and confounders?
Include everything and hope direction of causality is the same
What is Y^1 and Y^0. What does Y_i mean?
Y^1: Treated group
Y^0: Untreated group
Y_i: Specific unit
What is ATE, ATT and ATU?
Theoretical quantities.
ATE: Average treatment effect. E[delta_i] = E[Y_i^1]-E[Y_i^0]. How the entire population respond if treated.
ATT: Average treatment effect on the treated. E[delta_i|D_i=1].
ATU: Average treatment effect on the untreated. E[delta_i|D_i=0]. What is the treatment effect for the control group if they were treated.
If succesful randomization, then ATE = ATT = ATU.
How can ATE be estimated?
Different from the true ATE (unknown) because of non-random selection bias.
We need some sort of random chock so that control and treatment group are similar on confounders.
What is an estimator?
A mathematical rule that we apply to arrive at a specific value of interest (illustrated with a hat).
What is three useful qualities for the beta koefficient?
It is unbiased, efficient and consistent
What is an unbiased estimator?
It’s centered on the true population parameter (can be biased due to confounder).
E(x-bar) = my
What is an efficient estimator?
The varians around the mean is low (likely that an estimate is close to the true population parameter)
What is a consistent estimator?
When sample size increases the estimator must converge to the true population parameter.
x-bar - my –> 0 as n –> ∞ (Law of large numbers)
What is the difference between standard deviation and standard error?
Both measure the standard deviation but in to different distributions.
Standard error: Distribution of coefficient
Standard deviation: Distribution of variable (sigma)
What is SUTVA?
Stable unit treatment value assumption:
1) Homogenenous doses to all
2) No externalities (no spill over/interference between units)
No spill over to general equilibrium (scaling up)
What is causal inference?
The study of counterfactuals / comparing counterfactuals
What is the synonym for identification strategy?
What assumptions allow you to claim, that you have estimated a causal effect
Can we estimate an individual causal effect?
No. The best we can do is average treatment effects
What is selection bias?
Difference in outcome between treated and untreated if NO ONE was treated.
If randomization then E(Y_i^0|D=1)-E(Y_i^0|D=0) = 0
When should we distinguish between ATT and ATE?
When we don’t have a randomized experiment, the ATE and ATT could be very different numerically.
How are the assumptions for ATT different than ATE?
We observe Y^1 for the treated and to identify the ATT, we “only” have to find a control group that looks like the treatment group had they not been treated (the missing potential outcome Y^0).
We don’t need to assume Y^1 is the same for groups. We don’t need common support.
What is causal inference, identification strategy and identifying assumption?
Causal inference: Thinking of counterfactuals
Strategy: Research design
Assumptions: Key assumption need to met for causal estimation.
What is the fundamental problem of causality? What is the switching equation and how is the individual treatment effect estimated?
You can never observe both potential outcomes.
Y_i = D_iY_i^1 + (1-D_i)Y_i^0
Can never be estimated but: delta_i = Y_i^1 - Y_i^0