Causal inference Flashcards
Study designs for causal inference
Randomised controlled trials
Natural or quasi experimental studies
Longitudinal studies
Analysis methods for causal inference
Confounder adjustment/stratification
Propensity Scores
RCT:
Manipulation: The scientist actively interacts with the environment by modifying certain aspects (X) (according to F. bacon)
Randomization: Subjects are assigned to active (X=1) or control condition (X=0) at random, i.e., regardless of their characteristics. After successful randomization, samples assigned to active, or control condition are equivalent in all aspects (Z) but the exposure (X).
Issues with RCT’s
Lack of external validity/generalizability
Both groups balanced for cofounders
Pyramid of evidence
Level 1: Systematic review of RCT
Level 2: Single RCT
Level 3: Systematic review of observational studies
Level 4: Single observational study
Level 5: Qualitative studies
Level 6: Expert opinion/case studies
Pros RCT
Treatment and control groups can be matched
Strong evidence for cause-effect relationships
Statistical methods are relatively straightforward
Cons of RCT
Expensive, take a long time
Potentially unethical in some circumstances
Prone to selection bias
It might not be generalizable to the real world (too artificial)
Natural or Quasi-experimental designs
Not everything can be an RCT, not possible or not ethical
Can use longitudinal prospective samples
Chronology used to parse out causality, but you are looking at how t0 affects t1, but you cannot see this in reverse to determine the direction of the effect
Twin studies
People love because matched for genes and home-enviro, almost like natural RCT
Monozygotic twins are naturally “matched” for many common confounders
Grew up in the same family at the same time, and share genetic risk factors
Differences in outcomes associated with discordant exposures within twin pairs are often described as causal
Discordant twin design (twins with different in utero outcomes)
Naturally only a twin sample, which may be a specific experience and cannot be easily generalized
Instrumental Variable design
No manipulation or randomization
Although there is no manipulation or randomization of the exposure variable (X), there is manipulation and/or randomization of an instrumental variable (I).
An instrumental variable (I) must:
be correlated with an exposure X (ideally explaining much of the variance in X),
be NOT correlated with the error Z,
be correlated with an outcome Y only through X
Counterfactual
If there are no twins, instruments or policies
No exposure is a causal factor in itself, in isolation.
In this context, we are thinking about causality as a difference between two groups that are otherwise identical, only exists as contrast
Causality may only be derived as part of a well-defined contrast between one condition (e.g., exposure) and an alternative condition (e.g., no exposure), while holding everything else constant.
Causal contrasts can be estimated by using substitutes for the counterfactual condition.
To the extent that substitutes are equivalent to the factual condition in all aspects but the exposure (i.e., they are exchangeable with the counterfactual condition), substitutes can be used to infer causality.
In epidemiology, substitutes are generally either a population other than the target population during the same etiological period or the target population observed at a time other than the etiological period
Cholera counterfactual case
In 1854, John Snow mapped cholera cases in Soho and found that most people who had died from cholera had drunk the water from the Broad Street Pump.
Snow argued to close the pump and removed its handle and the cholera outbreak ended.
Cannot actual be sure this is the cause because not a formal manipulation, but…
Observed outcome (Y), cholera deaths
Exposure (X), Water pump
Two conditions:
0. Pump is closed (X=0), no water is coming out, the neighbourhood is unexposed to the potentially contaminated water
1. Pump is left open (X=1), water is coming out the pump, the neighbourhood is exposed to the potentially contaminated water.
0 is what happened
Observed outcome, cholera deaths (Y), of what happened when exposure (X) was “closed” (X=0).
Y|X=closed = YX=0 = Y0
Unknown potential outcome (Y) that would have happened if the exposure (X) had been – counter to the fact – left “open” (X=1).
Y|X=Open = YX=1 = Y1
Potential outcome: To identify the causal effect of closing the pump on mortality, we would need to compare:
Y0 :The number of deaths (Y) when the pump was closed (X=0)
Y1: The number of deaths (Y) when the pump was left open (X=1)
FUNDAMENTAL PROBLEM OF CAUSAL INFERENCE
We can never know the potential outcome for a counterfactual exposure!
For each ‘unit of analysis’ (condition) we can only observe one potential outcome
Exchangeability
Causal contrasts can be estimated by using substitutes for the counterfactual condition.
To the extent that substitutes are equivalent to the factual condition in all aspects but the exposure (i.e., they are exchangeable with the counterfactual condition), substitutes can be used to infer causality.
Different statistical approaches can be taken to improve equivalence/exchangeability.
Statistical approaches for exchangeability
Stratification in regression
Propensity scores