Week 2 - Propensity Score Estimation Flashcards

Question 1

Q

What are the steps of propensity score estimation?

Answer

A

Data Preparation
Propensity Score Estimation
Propensity Score Method Implementation
Covariate Balance Evaluation
Treatment Effect Estimation
Sensitivity Analysis

Question 2

Q

How can the success of propensity score estimation be determined?

Answer

A

a) The propensity score estimation converged to a solution b) Common support is adequate for estimation of the treatment effect of interest
c) Adequate covariate balance is obtained.

Question 3

Q

What are true confounders?

Answer

A

A true confounder is a covariate that has a direct effect on the probability of treatment assignment and a direct effect on the outcome.

Question 4

Q

What is the consequence of including variables that are related to the outcome but not to treatment assignment in the propensity score model?

Answer

A

Including variables related to the outcome but not to exposure (treatment assignment) in the PS model does not affect bias but decreases variance.

Question 5

Q

What is the consequence of including variables that are related to the treatment assignment but not to the outcome in the propensity score model?

Answer

A

Including variables related to exposure (treatment assignment) but not the outcome does not affect bias but increases variance.

Question 6

Q

What are three strategies that can be used to select covariates for the propensity score model?

Answer

A

Theoretical analysis of factors influencing the selection mechanism and their relationship with outcomes.
Pilot study focused on identifying the selection mechanism.
Expert reviews and interviews with participants and other persons knowledgeable about the selection process.
Use a sub-sample of the original data.

Question 7

Q

Why is it important not to use the outcome data in the process of selecting covariates for the propensity score model?

Answer

A

To maintain researcher objectivity in the implementation of propensity score methods, and to parallel the design of randomized experiments.

Question 8

Q

What are two strategies to use multiple imputation to deal with missing data in the propensity score estimation process?

Answer

A

Multiple imputation of covariates, followed by averaging of multiple propensity scores to create single propensity score vector.

OR

Multiple imputation followed by separate propensity score analysis of each imputed dataset.

Question 9

Q

Identify three methods that can be used to estimate propensity scores.

Answer

A

logistic regression (stats)
profit regression (stats)
classification trees (data mining)
boosting (data mining)
bagging (data mining)
random forests (data mining)

Question 10

Q

What is the main challenge for using logistic regression to estimate propensity scores?

Answer

A

If a model consists of a large number of variables and covariate balance is not achieved on one or more variable, determining the cause of the problem (e.g., interaction effects) and an appropriate model can prove problematic. (?)

Question 11

Q

How do classification trees produce estimates of propensity scores?

Answer

A

Each variable is split into two nodes that are more homogenous than the existing node with respect the outcome variable.
The classification tree algorithm calculates this gain for every possible split and selects the split resulting in largest gain in impurity reduction.
Variables may be used more than once to allow interactions. Trees automatically capture interactions and non-linear effects.
The algorithm iteratively splits variables until a stop criterion is met.

Question 12

Q

What are the limitations of classification trees for propensity score estimation?

Answer

A

The results have a high level of variability with many covariates and frequently produce poor estimates of propensity scores.

Classification trees tend to over-fit the data, producing a tree with many branches that are due to random variation in the data and do not cross-validate to other datasets.

Question 13

Q

What is the difference between classification trees and bagging?

Answer

A

Bagging (bootstrapped aggregation) improves upon classification trees by running a large number of trees with bootstrapped samples of the same size of the original sample, taken with replacement. These trees are run using all available variables and without any pruning. Then, the results are combined into a composite tree, which is less affected by random variability in the data than a single tree.

Question 14

Q

What is the difference between bagging and random forests?

Answer

A

The random forest algorithm is similar to bagging, except that only a subset of the complete set of variables is used at each iteration.

Question 15

Q

What are the advantages of random forests over bagging?

Answer

A

Random forests prevent that one variable dominates another and guarantee that all variables participate in building some trees (Strobl, et al., 2009).

Question 16

Q

How can generalized boosted modeling produce estimates of propensity scores?

Answer

A

The GBM implementation proposed by McCaffrey et al. (2004) estimates , the log odds of treatment assignment, with an iterative process where regression trees are used to obtain an improvement in the estimate with each new iteration. The starting value of the logic is log[Zbar - (1 - Zbar)] where Zbar is the proportion treated.

Question 17

Q

What is the difficulty of implementing generalized boosting with respect to determining a stop criterion?

Answer

A

There is no defined stopping criterion, so errors decline up to a point and then increase.

Question 18

Q

What stop criterion can be used with generalized boosted modeling in propensity score estimation?

Answer

A

For propensity score estimation, McCaffrey et al. (2013; 2004) recommended using a measure of covariate balance, to stop the GBM algorithm the first time that a minimum covariate balance is achieved.

Question 19

Q

What is common support?

Answer

A

The area of the propensity score distribution where values exist for both groups.

Question 20

Q

Why is it important to evaluate common support?

Answer

A

Cases without common support means the treatment effect cannot be estimated.

Question 21

Q

How can common support be evaluated?

Answer

A

Visualizations (box-and-whisker plots and histograms)
Inspection of minimums and maximums of propensity score distributions for treated and control.
After matching, check if there are unmatched units.
After stratification, check if there are strata with zero cases from one of the treatment groups.