F6 Directed acyclic graphs and potential outcomes causal model Flashcards
What is a DAG?
Graphical representation of the theorized data generating proces. It models chain of causal effects
Why use a DAG?
Simplifies theoretical arguments
Model the argument
Communication to the reader
What does nodes, arrows, filled and dotted lines mean?
Node: A random variable
Arrow: A causal relationship
Filled line: Observed
Dotted line: Unobserved
What are the principles of a DAG?
Causality runs in one direction forward in time (no cycles and no endogeneity)
Reverse causality and simultaneity are not possible
Causality is understood in terms of counterfactuals
What is a confounder? (draw it)
Affect both D and Y = an open backdoor path that needs to be controlled for.
D <– X –> Y
What is a collider? (draw it)
D and Y affect X = closed backdoor path. Controlling will result in bias.
D –> X <– Y
What is Jan’s strategy regarding colliders and confounders?
Include everything and hope direction of causality is the same
What is Y^1 and Y^0. What does Y_i mean?
Y^1: Treated group
Y^0: Untreated group
Y_i: Specific unit
What is ATE, ATT and ATU?
Theoretical quantities.
ATE: Average treatment effect. E[delta_i] = E[Y_i^1]-E[Y_i^0]. How the entire population respond if treated.
ATT: Average treatment effect on the treated. E[delta_i|D_i=1].
ATU: Average treatment effect on the untreated. E[delta_i|D_i=0]. What is the treatment effect for the control group if they were treated.
If succesful randomization, then ATE = ATT = ATU.
How can ATE be estimated?
Different from the true ATE (unknown) because of non-random selection bias.
We need some sort of random chock so that control and treatment group are similar on confounders.
What is an estimator?
A mathematical rule that we apply to arrive at a specific value of interest (illustrated with a hat).
What is three useful qualities for the beta koefficient?
It is unbiased, efficient and consistent
What is an unbiased estimator?
It’s centered on the true population parameter (can be biased due to confounder).
E(x-bar) = my
What is an efficient estimator?
The varians around the mean is low (likely that an estimate is close to the true population parameter)
What is a consistent estimator?
When sample size increases the estimator must converge to the true population parameter.
x-bar - my –> 0 as n –> ∞ (Law of large numbers)