6. Directed Acyclic Graphs and the Potential Outcomes Causal Model Flashcards
An introduction to directed acyclic graphs and potential outcomes causal model
What is a directed acyclic graph (DAG)?
A Directed Acyclic Graph (DAG) is a visual representation of causal relationships, based on prior knowledge, logic, and reasoning. It uses nodes to represent variables and arrows to represent causal effects.
Features:
* Causality runs in one direction: Arrows indicate causal effects and always point forward in time - no cycles or loops are allowed.
* No reverse causality or simultaneity: DAGs do not allow for reverse causality or simultaneous causation (when two or more variables influence each other at the same time) between variables.
* Causality in terms of counterfactuals: focuses on comparing two states - one that occurred and one that could have occurred under different conditions (the counterfactual).
What is an observed confounder, unobserved confounder, collider and mediator variable in a DAG?
An observed confounder is a measurable variable that influences both the treatment (D) and the outcome (Y), creating a spurious association. Example: In D←X→Y, X is an observed confounder. Observed confounders are included in the analysis to block (control for) the backdoor path, isolating the causal effect of D on Y.
An unobserved confounder is a variable that influences both the treatment (D) and the outcome (Y) but cannot be measured or included in the analysis.
Example: In D<—U—>Y, U is an unobserved confounder. Unobserved confounders leave backdoor paths open, introducing bias that cannot be directly eliminated.
A collider is a variable that is influenced by two or more variables, where arrows converge on the same node. Example: In D→X←Y, X is a collider because both
D and Y point to it. Colliders naturally block backdoor paths they are part of. However, conditioning on a collider (e.g., adjusting for X) reopens the path and introduces bias.
A mediator is a variable that lies on the causal pathway between the treatment (D) and the outcome (
Y). Example: In D→M→Y, M is the mediator.
Role: Mediators help decompose the total effect of
D on Y into direct effects (D→Y) and indirect effects (D→M→Y).
What is the difference between a direct path and backdoor path in a DAG?
A direct path represents the causal effect of one variable on another. For example, in a DAG where there is an arrow pointing directly from D (the treatment) to
Y (the outcome), D→Y indicates that D directly causes
Y.
A backdoor path represents an indirect connection between D and Y that introduces a spurious relationship. This occurs because of a confounder. For example, in the path D←X→Y, X is a confounder that creates a backdoor path between D and Y. The backdoor path does not represent causation but rather a source of bias that can distort the estimation of the true causal effect of D on Y.
What is the backdoor criterion?
The backdoor criterion is a rule to identify which variables to condition on (control for) to remove confounding and estimate the true causal effect of a treatment (D) on an outcome (Y).
Backdoor paths can be closed in two ways:
1. By conditioning on confounders—variables that influence both D and Y—using methods like regression or matching.
2. By leveraging colliders—variables where two arrows converge. Colliders naturally block backdoor paths, but conditioning on them can reopen the path and introduce bias.
When all backdoor paths are closed, you meet the backdoor criterion, meaning you’ve isolated the true causal effect of D on Y.
What is the potential outcomes causal model?
The potential outcomes causal model is a framework for defining and estimating causal effects by comparing potential outcomes under different treatment conditions. The potential outcomes notation expresses causality in terms of counterfactuals.
Key components:
* Treatment (D): The intervention or variable whose causal effect is being studied. D=1 means treated, D=0 means not treated).
* Potential outcomes: Yi^1 is the outcome if the unit, i, receives the treatment. Yi^0 is the outcome if the unit, i, does not receive the treatment.
* Observed outcome: The observed outcome, Yi, - disitinct from potential outcomes - is a function of its potential outcomes (depends on the treatment assignement). Yi=Di⋅Yi^1+(1−Di)⋅Yi^0
* Unit-specific treatment effect/causal Effect: The difference between the two states of the world is the treatment effect. δi=Yi^1−Yi^0
What is then the fundamental problem of causal inference?
In the potential outcomes causal model, each unit has two potential outcomes (Yi^1 and Yi^0), that we need to subtract from eachother to get the treatment effect.
For any given unit, we can only observe one of these outcomes, depending on whether the treatment was applied or not. The counterfactual outcome remains unmeasurable. Since we can only observe one of these outcomes, we cannot directly calculate the causal effect for an individual –> the fundamental problem of causal inference
What are the three primary parameters used to describe treatment effects (ATE, ATT and ATU)?
ATE: The average causal effect of a treatment across the entire population. Measures the overall effect of the treatment if everyone in the population were treated versus if no one were treated.
* ATE = E[δi] = E[Yi^1-Yi^0] = E[Yi^1] -E[Yi^0]
ATT: The average causal effect of a treatment for those who received the treatment. Measures the impact of the treatment for those who were exposed to it, capturing the realized effect for the treatement group.
* ATT = E[δi|Di = 1] = E[Yi^1|Di = 1]-E[Yi^0|Di = 1]
ATU: The average causal effect of a treatment for those who did not receive the treatment. Reflects the expected impact of the treatment if it were applied to those who were not treated.
* ATU = E[δi|Di = 0] = E[Yi^1|Di = 0] - E[Yi^0|Di = 0]
What is the simple difference in means decomposition (SDO) and why is it problematic?
The simple difference in means decomposition (SDO) is a method for estimating treatment effects by caulculating the difference in the average observed outcomes of the treatment group (D=1) and the control group (D=0):
* E[Y^1│D=1]-E[Y^0│D=0]
The SDO method can be problematic because it does not account for confounding factors—variables that influence both the treatment assignment (D) and the outcome (Y).
* Selection bias: The treatment and control group may differ in ways that affect the outcome even in the absence of treatment –> introduces bias because the groups are not directly comparable.
* Heterogeneous treatment effect bias: The effect of the treatment may vary across individuals or groups. If the treatment benefits some individuals more than others, and if these individuals are more likely to be in one group than the other, it can create additional bias.
What are the assumptions for the SDO to cedibly estimate the ATE?
Independence assumption: The treatment assignment is independent of potential outcomes, (Y^1,Y^0)⊥D. The groups who receive and do not receive the treatment are comparable in terms of their potential outcomes –> any differences in outcomes between the groups are solely due to the treatment, not pre-existing difference.
Implications
1. E[Y^1∣D=1]-E[Y^1∣D=0]=0, the expected potential outcome under treatment (𝑌^1) is the same for the treatment and control group.
2. E[Y^0∣D=1]-E[Y^0∣D=0]=0, the expected potential outcome under no treatment (𝑌^0) is the same for the treatment and control group.
Stable unit treatment value assumption (SUTVA):
* No variation in treatment dosage: Each unit receives the same “dose” of the treatment.
* No spillovers: The treatment of one individual should not affect the potential outcomes of another individual.
Under these assumptions the SDO=ATE
What is randomization inference?
Randomization inference is a statistical method to test whether the observed treatment effect in an experiment could have happened by chance. It focuses on the randomness of treatment assignment rather than sampling variability.
By repeatedly shuffling treatment assignments and recalculating the test statistic, it creates a distribution of outcomes under the null hypothesis. The observed test statistic is then compared to this distribution to calculate an exact p-value.
The p-value represents the probability of observing a test statistic as extreme as the one in your data, purely by chance, if the null hypothesis is true. A small p-value (
p<0.05) suggests that the observed effect is unlikely to be due to random chance, providing evidence against the null hypothesis.
What is Fisher’s sharp null hypothesis?
Fisher’s sharp null hypothesis states that no individual has a treatment effect (H0:Yi^1−Yi^0=0∀i). This means the observed outcomes are the same regardless of whether an individual is treated. This assumption enables researchers to infer counterfactual outcomes for every individual in the study.
Steps of randomization Inference
- Assume a null hypothesis (treatment has no effect).
- Calculate the observed test statistic (the difference in means between treatment and control groups).
- Shuffle the treatment assignments repeatedly to simulate what would happen under random chance.
- Recalculate the test statistic for each randomized treatment assignment.
- Compare the observed test statistic to the distribution of randomized test statistics to calculate the p-value.