PROPENSITY SCORE Flashcards
What does a propensity score represent?
The predicted probability of exposure in a particular individual based on a set of relevant characteristics (because confounders increase the propensity of being exposed)
What is the role of the propensity score?
Estimates treatment effects by controlling for confounding in observational cohort studies
What’s the best distribution of exposure/outcome for a propensity score?
Common exposure, rare outcome
Which variables should we
a) include
b) not include
in a logistic regression to estimate the propensity score?
a) All variables related to the outcome, whether or not they are related to the exposure
b) UNLESS they are a consequence of the exposure (collider); also exclude variables unrelated to the outcome
What is the impact of including variables that are unrelated to the exposure despite being related to the outcome?
None, it’s fine.
It decreases the variance without increasing the bias.
What is the impact of including variables that are unrelated to the outcome despite being related to the exposure?
It increases the variance without decreasing the bias.
What are the main 2 steps of creating/using propensity scores?
- Model the exposure variable in a regression as a function of potential confounders. This calculates the predicted probability of exposure for every individual as a function of these covariates.
- Apply the propensity score by matching, stratifying, controlling or weighting.
For a given propensity score, what is the chance of control/experimental arm the same as?
The choice of control or experimental arm is the same as a random process, given that the patient had a real choice
What is the C statistic?
- Means: concordance
- A measure of model discrimination, but cannot judge adequacy
- Estimates the probability that a patient randomly selected from the treatment arm has a higher propensity score than a patient randomly selected from control arm (should be high)
What should the area under a ROC curve be for a propensity score?
0.5 (random)
Summary of propensity score matching?
- Finding individuals with similar propensity score in both arms and matching them (e.g., 1:1 nearest-neighbor-matching, with a match of 0.1 distance)
- Calculate treatment effect with matched pair analysis
- Equivalent to simulated randomization (distributes confounding evenly)
Summary of propensity score adjusting?
- Including the propensity score as a covariate in a regression model (ideally, with the individual covariates, leading to a doubly robust model)
Summary of propensity score stratifying?
- Splitting the dataset on the basing of the propensity score alone, and then estimating the treatment effect in each stratum and taking the weighted average for overall effect
Summary of inverse probability weighting?
- Re-weighting individuals from the whole dataset to increase the weight of those with unexpected exposure
- Equivalent to producing additional observations for where there is few observations
- Creates a pseudopopulation with near-perfect covariate balance
Weight given to treatment and control arm in inverse probability weighting? Any problem with that?
Tx: 1/PS
Control: 1/(1-PS)
Problem: when the PS is close to 0 for the tx arm, and close to 1 for control arm
Which methods eliminate systematic differences the best?
- Matching
- IPW
What is a potential problem with propensity score adjustment?
- May bias if the assumption about the functional relationship between the propensity score and the outcome is wrong (e.g., linearity)
Which methods are potentially doubly robust? Which are not?
Yes: Propensity score adjustment (if adding also individual covariates), IPW (how?)
No: Matching, stratifying (N is too small)
What are the pros and cons of traditional covariate adjustment?
Pros: performs well, provides prognostic model for outcome of interest
Cons: not good for small n and many covariates
What are the pros and cons of propensity score matching?
Pros: reliable, good balance of covariates, simple
Cons: unmatched subjects are not analysed, less precise due to small n
What are the pros and cons of propensity score adjustment?
Pros: performs well
Cons: very similar to traditional adjustment, without necessarily being better
What are the pros and cons of propensity score stratifying?
Pros: keeps all the data, can look at interactions
Cons: not as good with few outcome events, does not account for strong confounding
What are the pros and cons of inverse probability weighting?
Pros: keeps all the data, easy, the pseudopopulation has a perfect covariate balance
Cons: Unstable when extreme weights
What are the limitations to propensity score methods in general?
- Only binary treatment/exposures
- No balancing of unmeasured covariates
- Can’t get the effect of the covariates that are adjusted for
- Not for prediction
- Cannot handle time-varying exposures or confounder
4 sources of residual confounding
- Bad definition
- Imperfect proxy
- Not included
- Misclassified
What is an instrumental variable?
One that is:
1) Causes the exposure
2) Affects the outcome only through the exposure (=’exclusion criteria’)
3) Not associated with any confounders (=’independence criteria’)
Looking at instrumental variable is equivalent to…
Intent to treat estimate in RCTs
What are endogenous regressors? Exogenous regressors? And what are instrumental variables?
Endogenous are correlated with the exposure, while exogenous are not (often proxy, for example extreme weather for stress during pregnancy)
Instrumental variables are exogenous
An instrumental variable analysis trades…
The ignorability of the treatment variable
for
the ignorability of the instrument (that we think is more plausible)
What if the instrumental variable is weak?
It’s biased toward the null due to non-differential misclassification
What are the two stages to calculate IV estimates?
1 . Each endogenous explanatory variable is regressed on all exogenous variables in the model (both in the equation of interest and the excluded instruments). We take the predicted values of these regressions.
- We estimate the regression of interest, but we replace each endogenous covariate with the predicted value from the first stage.