PSM Flashcards
What is exact matching?
Comparing individuals for whom the values of x are identical
rarely an option in practice since it’s often difficult to find T and C groups with identical values
What is the purpose of matching?
To reproduce the treatment group among the non-treated
What two conditions must be met to implement matching estimators?
- Conditional independence assumption (CIA): There exists a set x of observable covariates such that after controlling for these covariates, the potential outcomes are independent of T status
- Common support assumption: For each value of x, there is a positive probability of being both treated and untreated (you can find a treated unit to match with an untreated unit)
What is the CIA?
Conditional independence assumption (CIA): There exists a set x of observable covariates such that after controlling for these covariates,
What is common support assumption?
Common support assumption: For each value of x, there is a positive probability of being both treated and untreated (you can find a treated unit to match with an untreated unit)
How is the CIA used to construct a counterfactual for the treatment group?
It implies that after controlling for x, the assignment of units to T is “as good as random”
What assumption does the CIA require?
That all variables relevant to the probability of receiving treatment may be observed and included in x
Why is PSM called a “data-hungry” method?
You need a lot of data for this method
What is the propensity score?
The probability that a unit in the combined sample of treated and untreated units receives the T, given a set of observed variables
What does the propensity score theorem say?
You only need to control for the probability of treatment, because if conditional on x, Ti and (Y1i, Y0i) are independent, then conditional on the propensity score p(xi), Ti and (Y1i, Y01) are independent
Three steps for estimating program impact using PSM?
- Estimate propensity score
- Choose matching algorithm
- Estimates impact of intervention with matched sample
True or false: Use flexible functional form to estimate propensity score
True–want to allow for possible nonlinearities in the participation model (i.e., include higher-order terms and interaction terms)
With or without replacement-which is better?
Without replacement-can only be matched with one treated unit
Estimators are more stable if a number of comparison cases are considered for each treated case–ie usually should use replacement
What is nearest neighbor matching?
Individual from comparison group with closest propensity score is chosen–note that this can be done with or without replacement
What is radius matching?
Specify a caliper (maximum propensity score difference)
Implication for bias and variance of reducing caliper?
Reduces the bias
Increases variance
How do you implement kernel method?
Choose a kernel function, specify bandwidth parameter
Compare each treated unit to a weighted average of the outcomes of all untreated units, with higher weights placed on untreated units with scores closer to that of treated individual
Implications for bias and efficiency of choosing only one neighbor for nearest neighbor matching?
Minimize bias by using most similar observation
Ignore information–>reduced efficiency
Conventional method for calculating standard errors from PSM estimates?
Bootstrapping-sample from analysis sample with replacement, and replicate multiple times
You need to be sure that measures to generate PSM score are not confounded with outcomes or anticipation of treatment–what types of measures should you use?
- stable over time or
- deterministic (ie age) or
- measured before participation
How to check specification of your model re CIA?
balancing tests (does the estimated propensity score adequately balance characteristics between T and C group units?)
How to check specification of your model re common support?
- visual inspection of densities of propensity scores
- comparison test such as Kolmogrov-Smirnov
- are there big differences between maxima and minima of density distributions?
What are we doing when we use propensity score to calculate ATT?
For each propensity score, we calculate the difference in mean outcomes for the treated and untreated with that p(X)
We then take a weighted average of these over the different propensity score values
Two advantages of PSM over regression that controls for x?
- Matching does not require assumptions about functional form (eg linear relationship)
- Regression runs risk of extrapolating onto a space where there is little common support
5 requirements for covariate selection
- Choose x’s so that unconfoundedness holds
- Should be correlated with treatment (Di) and outcome
- Selection should be based on theory
- x’s should be measured before treatment and not affected by it
- x;s should not be too good at predicting treatment–we are relying on common support
Implications for bias and standard errors of implementing nearest neighbor with replacement?
better matches–> possibly less bias
higher standard errors
Implications for bias and standard errors of implementing nearest neighbor without replacement?
worse matches–>possibly more bias
lower standard errors
Why would NN matching without replacement lead to lower standard errors?
Using more variation
Formula for propensity score matching estimator for ATT?
E[Y(1)|D=1, P(x)] - E[Y(0)|D=0, P(x)]
(treated-untreated)–note that the second term subs in for the unobserved term that we really want to know, which is E[Y(0)|D=1, P(x)]
stratification and interval matching
Paritions the common support into intervals (strata) and then calculates mean differences within these strata