Lecture 2 (Matching and related methods) Flashcards

1
Q

When do we say that something is identified?

A

“ $\beta$ is identified in the sense that we can write $β $ as a function of only observable variables (population moments of them)”

E.g., Under random assignment

$$
ATE = E[Y|D=1]-E[Y|D=0]
$$

We say that $ATE$ is identified because it can be expressed in terms of the observed variables, $Y, D$.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we mean with partial identification?

A

Partial identification in econometrics is an approach to conducting inference on parameters in econometric models that recognizes that identification is not an all-or-nothing concept and that models that do not point identify parameters of interest can, and typically do, contain valuable information about these parameters.

The partial identification approach to inference recognizes that this process should not result in a binary answer that consists of whether the
parameter is point identified.

In statistics and econometrics, set identification (or partial identification) extends the concept of identifiability (or “point identification”) in statistical models to situations where the distribution of observable variables is not informative of the exact value of a parameter but instead constrains the parameter to lie in a strict subset of the parameter space.

Manski developed a method of worst-case bounds for accounting for selection bias. Unlike methods that make additional statistical assumptions, such as Heckman correction, the worst-case bounds rely only on the data to generate a range of supported parameter values.

Hence, we can create ATE bounds. Combining $ATE_{max}$ and $ATE_{min} $ gives the upper and lower bound on ATE. These bounds are wide but apply under any assumptions about the treatment assignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is unconfoundness?

A

Rosenbaum and Rubins’s (1983) definition of unconfoundedness:

$$
D \perp Y(0),Y(1)|X
$$

Unconfoundedness requires that conditional on observed covariates there are no unobserved factors that are associated both with the assignment and with the potential outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by overlap (common support)?

A

Formally, for identification, we also need a second assumption: common support (overlap).

$$
Pr(D=1|X=x) \in(0,1), \ \forall x
$$

That is, for all possible values of the covariates, there are both treated and control units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a propensity score?

A

Using the definition of unconfoundedness, like we can condition on $X$, we can condition on $e(x)$, which is an individuals propensity (score) to be treated given $X$. That is,

$$
e(x) = Pr(D=1|X=x)
$$

The propensity score is a ”balancing score”. We can use the propensity score in different ways.

We might never observe the true propensity score (know the assignment mechanism), thus, we usually have to estimate the propensity score, $\hat e(X)$

The propensity score is estimated with e.g., a probit or logit model.

We can use propensity score for many different things. E.g., weighting or matching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the inverse probability weighting estimator? IPW

A

\hat \tau^{IPW}{ATE} = \frac 1 N \sum{i=1}^N \Big[ \frac{D_i Y_i}{\hat e(X_i)}-\frac{(1-D_i)Y_i}{1-\hat e(X_i)} \Big]

See propensity score matching.

The propensity score is estimated with e.g., a probit or logit model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we need when we do propensity score weighting (more general than data on variable x,y,z etc)

A

In total, this approach requires:

  • a correctly specified regression model
  • a correctly specified propensity score model

Misspecification in either model may lead to misleading estimates!

The solution is to instead ur a “doubly robust” estimator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to the procedure look like with propensity score matching?

A

The process is the following:

  1. Use the treatment variables and covariates to estimate the propensity score using a logit or probit model. This model then creates $\hat e (X_i)$.
  2. Find the closest match in terms of $\hat e (X_i)$ for each treated and untreated. That is, those with the most similar propensity score.
  3. Estimate ATE or ATT.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can we never do with propensity score matching?

A

With propensity score weighting, we can not use bootstrap!

The reason we can’t use bootstrapping is that one individual can be the closest neighbor to many other individuals, not sampling this one would then make a huge difference to our estimation. We should instead use Imbens’s (2006) method or Kernal-matching with bootstrap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is it bad or good to use propensity score matching?

A

While Dehejia and Wahba (2002) revisited the LaLode paper and showed that propensity score matching could fix the bias-problem, Smith and Tood (2005) showed that these results were very sensitive to the choice of control variables etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the intuition of weighting?

A

The outcomes of the treated are weighted towards the full population. For instance, if there are fewer females among the treated than in the full population, the outcomes of the treated females are given larger weight to capture the gender distribution in the full population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the intuition with the Doubly robust estimator?

A

Remember that for propensity weighting we needed
- a correctly specified regression model and
- a correctly specified propensity score model

The doubly robust approach combines the regression approach and the propensity weighting approach. This only requires that either the regression model or the propensity model is correctly specified. Therefore it is favored by theoretical work and simulations.

That is, it protects against both misspecification of the propensity score and the regression model so that we obtain correct estimates if only one of the two is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the intuition behind the nearest neighbor matching.

A

There are different kinds of matching procedures. One is nearest-neighbor matching. This estimator imputes the missing potential outcomes using only the outcomes of a few nearest neighbours of the opposite treatment group. They are “nearest neighbour” in the sense that $X$ are similar.

Using only a single match leads to the most credible inference with the least bias, at the cost of sacrificing some precision.

For each treated and non-treated we use the outcome of the nearest neighbor to impute the missing potential outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Show the matching estimator

A

\hat \tau^{Match}{ATE} = \frac 1 N \sum{i \in D_i =1}^N[\hat Y_i(1) -\hat Y(0)]

where $\hat Y(0)$ is the nearest neighbor. Cold bee in terms of propensity scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are ATT unconfoundedness and conditional mean independence? How are they different?

A

Original unconfoundedness:

$$
D \perp Y(0),Y(1)|X
$$

ATT unconfoundedness:

$$
D \perp Y(0)|X
$$

Conditional mean independence:

$$
E[Y(0)|X,D] = E[Y(0)|X]
$$

Original unconfoundedness and ATT unconfoundedness are in theory different, but not in practice. The same goes for ATT unconfoundedness and conditional mean independence.

Choosing to use the notation of ATT unconfoundedness or conditional mean independence, depends on the school of thought, statistics vs economics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What effect can control variables have in the matching world?

A

Too many control, variables will create a problem with common support and increase the variance of the estimates.

17
Q

What checks do we need to do when working with matching?

A

When using matching, we like to check for balancing of the covariates using a balance test = “Standardizes bias”.This shows if our covariates are more similar after the matching procedure.

18
Q

What is one way to evaluate unconfoundedness assumption if we have panel data?

A

One way of assessing the unconfoundedness assumption is to use longitudinal data and do placebo checks on pseudo-outcomes.

19
Q

What is the difference between weighting and mathing?

A

Matching directly matches most similar units in the treatment and control groups. Weighting simply assigns different weights to different observations depending on their probability of receiving the treatment.

20
Q

Which are the identifying assumptions for matching?

A

Matching relies only on two assumptions:

  • Conditional independence/unconfoundedness
  • Common support