7. Matching and Subclassification Flashcards
Introduction to matching and subclassification
What are the three key matching strategies to satisfy the backdoor criterion?
To satisfy the backdoor criterion and isolate the causal effect of a treatment (D) on an outcome (Y), we need to block all confounding paths between D and Y. There are three key matching strategies to achieve this:
1. Subclassification
2. Exact matching
3. Approximate matching.
What are key assumptions in subclassification, exact and approxiamate matching?
Conditional Independence Assumption (CIA): After conditioning on the confounder
X, treatment assignment is independent of potential outcomes. This means treated and untreated groups within each stratum are comparable.
* (Y^1,Y^0)⊥D∣X
Common Support: For each value of the confounder X, there must be both treated (D=1) and untreated (D=0) units. This ensures sufficient overlap to make valid comparisons. Without common support, it would be impossible to make meaningful comparisons between treated and control units in certain strata, leading to biased estimates.
* 0<P(D=1∣X)<1
No unmeasured confounders: All relevant confounders that affect both the treatment assignment and the outcome must be observed and included in the matching process. Matching only balances observed covariates, so unobserved confounders can still bias the results.
What is the method of subclassification?
Subclassification is a method used in causal inference to adjust for confounding.
* The idea is to divide the dataset into stratas, based on the values of one or more confounders (e.g. age, income, or education).
* Within each stratum, we compare outcomes between treated and untreated units, assuming that within each stratum, the groups are balanced with respect to the confounder.
* These differences are then weighted by the proportion of the population in each stratum to estimate the overall causal effect (ATE).
By balancing treated and untreated groups within these strata, we create conditions that approximate a randomized experiment, satisfying the backdoor criterion and allowing for an unbiased estimate of the causal effect.
What are practical considerations when using subclassification?
- Choosing covariates: Covariates used for stratification must block all backdoor paths (satisfy the backdoor criterion). They should be measured before treatment (exogenous) and not be affected by treatment to avoid bias.
- Number of strata: The number of strata should be large enough to ensure balance within each stratum but not so large that there are too few units in each group
- Unobserved confounders: Subclassification only adjusts for observed confounders. Bias can persist if important confounders are not included, because they are unobserved.
- Curse of dimensionality: Adding many confounders can create data sparsity, leading to some strata lacking treated or untreated units. This violates the common support assumption and makes estimation unreliable.
What is the method of exact matching?
Exact matching is a method used in causal inference to estimate the effect of a treatment by comparing treated and untreated units with identical values for observed covariates.
* The idea is to find control units that are exactly the same as treated units in terms of the covariates of interest.
* Impute missing counterfactuals for treated units using the outcomes of their matched control units.
* Compare the outcomes of matched treated and control units to estimate the causal effect (ATT or ATE)
By ensuring treated and control units are identical in relevant covariates, exact matching creates balance between the groups and isolates the effect of the treatment.
Using exact matching, we can estimate both the ATT and ATE. The ATT focuses on estimating the causal effect of the treatment specifically for treated units by matching them with control units, while the ATE estimates the treatment effect for the entire population by imputing counterfactuals for both treated and untreated units.
What are practical considerations when using exact matching?
- Curse of dimensionality:As the number of covariates increases, finding exact matches becomes more challenging due to data sparsity –> higher likelihood of unmatched units
- Statistical power: Exact matching often results in smaller matched samples, as unmatched units are excluded. This can reduce the statistical power of the analysis, making it harder to detect treatment effects.
- Feasibility with discrete variables: Exact matching is more feasible when the covariates are discrete because the possible combinations are limited.
What is the method of approximate matching?
Approximate matching is a method used in causal inference to estimate treatment effects when exact matches between treated and control units are not feasible due to high-dimensional or continuous covariates. Instead of requiring identical covariate values, approximate matching identifies treated and control units that are similar in terms of their covariates.
Common types of approximate matching are
* Nearest neighbor covariate matching
* Propensity score matching
What is nearest neighbor covariate matching (approximate matching)?
Nearest neighbor covariate matching is a method to estimate treatment effects by pairing each treated unit with the most similar control unit(s) based on their covariates.
* The idea is to match treated units to control units that have the smallest distance in terms of covariates.
* You need to define a distance metric to measure how similar treated and control units are based on their covariates.
* Use the outcome of the matched control unit(s) as the counterfactual for the treated unit, so than you can compute the difference in outcomes and then average it to estimate the ATT or ATE.
What are different distance metrics in nearest neighbor covariate matching (approximate matching)?
Euclidean distance: Measures the straight-line distance between two units based on their covariates. It gives a raw sense of similarity but assumes all covariates are on the same scale, which can cause issues if some variables have much larger ranges than others.
* Does not consider either scale or correlation
* Use when covariates are on the same scale and uncorrelated
Standard euclidean distance: Improves upon Euclidean distance by adjusting for differences in covariate scales. Each squared difference is divided by the variance of the corresponding covariate, ensuring equal weight regardless of range.
* Does consider scale but not correlation
* Use when covariates have different scales but are uncorrelated.
Mahalanobis distance: Accounts for both the scale and correlation of covariates by using the covariance matrix. This metric adjusts for relationships between covariates and avoids overemphasizing correlated variables. If some covariates are highly correlated, it avoids over-counting their combined effect by normalizing the distances with the covariance matrix.
* Does consider scale and correlation
* Use when covariates are correlated or in a high dimension spaces (datasets with a large number of covariates)
What are matching discrepancies, and how do they introduce bias in approximate matching? (!)
What is propensity score matching (approximate matching)?
Propensity score matching (PSM) is a method of approximate matching used in causal inference to estimate treatment effects.
* The idea is to summarize covariates of each unit into a single value called the propensity score, which represents the probability of receiving the treatment given the covariates, p(X)=Pr(D=1|X). The propensity score is the conditional probability of receiving treatment and is estimated using a logit or probit model.
* Matching is then performed based on the propensity score, not individual covariates. Treated and control units with similar propensity scores are paired to create comparable groups.
* Covariate balance is assessed after matching to ensure treated and control units are similar.
* After matching, differences in outcomes between treated and matched control units are used to estimate treatment effects like ATT or ATE.
What is the propensity score theorem and the balancing property of the propensity score?
The propensity score theorem states that if the CIA holds, conditioning on the propensity score (p(X)) is sufficient to balance covariates between treated and control groups:
* (Y^1,Y^0) ⊥ D|X –> (𝑌^1,𝑌^0) ⊥ 𝐷∣ 𝑝(𝑋)
This simplifies matching by collapsing high-dimensional covariates (X) into a single scalar, avoiding the curse of dimensionality.
Balancing property of the propensity score: Within strata of the propensity score, the distribution of covariates (X) should be the same for treated and untreated groups:
* Pr(X∣D=1,p(X))=Pr(X∣D=0,p(X))
This property ensures that treated and control groups are comparable within propensity score strata. You check this empirically by assessing covariate balance after matching.
How can ATE and ATT be estimated using propensity scores?
- Inverse probability weighting (IPW): Weight each individual’s outcome by the inverse of their propensity score to account for differences in treatment probability.
- Nearest-neighbor matching: Match each treated unit to one or more control units with the closest propensity scores. The outcome of the matched control units is used as the counterfactual for the treated unit.
- Coarsened exact matching: Group covariates into categories to create strata for exact matches. Units are matched within strata, and weights are assigned based on strata membership. Use these weights in a simple weighted regression to estimate treatment effects.
What are matching discrepancies in approximate matching?
- Matching discrepancies occur when treated and control units are matched but differ on covariates or propensity scores.
- These discrepancies introduce bias because the matched control unit may not perfectly represent the counterfactual for the treated unit. The bigger the differences between matched units, the greater the bias in the estimated treatment effect.
- Larger sample sizes reduce the likelihood of large discrepancies by providing more potential matches.
How can bias from matching discrepancies be corrected?
Bias correction adjusts for residual differences in covariates between treated (Xi) and matched control units (Xj(i)) to address imperfections in matching.
A common method for bias correction is to adjust the estimated treatment effect by subtracting the difference in expected outcomes for the control group that is attributable to the mismatch in covariates. The expected outcome for the contorl group is often derived using methods like OLS.