F8 Regression discontinuity design Flashcards

1
Q

What is the running variable and cutoff?

A

Running variable is an observed confounder that determines the treatment status at a specific value/cutoff. Typically continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a RDD?

A

Regression discontinuity design. Exploits a natural cutoff/threshold which distributes units into either control or treatment group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does RDD estimate?

A

Local average treatment effect: LATE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the challenge with LATE?

A

The effect is limited to the individuals around the cutoff. Generalizability to the rest of the population is not possible.

Internal validity > external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a key advantage of RDD?

A

Superior in handling unobserved confounders. It convincingly eliminates selection bias if assumptions holds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Was is the key assumption?

A

Continuity. Potential outcome are continous functions of the running variable and smooth passing through the cutoff.

All confounders are assumed to be continuous at the cutoff.

Treatment becomes independent of potential outcomes. D is the only variable that affects the outcome and jumps discontinuously at the cutoff.

No simultaneous treatments and the cutoff cannot be endogenous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are examples of RDD?

A

Test-scores (SAT), geographical boarders, time, close elections and policy changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What must the assignment rule/cutoff be?

A

Known, precise and free of manipulation (if the RDD is sharp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Draw the DAG for RDD (both of them)

A

Squares.

D –> Y: The causal relation of interest.

X –> Y: Confounder. The running variable affect the outcome (independently from D). Out of influence under RDD

U: Unobserved confounder causing bias. Out of influence under RDD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the mathematical formula?

A

Homogenous effects: Y_i = α + βx_i + δD_i + ε_i (changes only the intercept).

Heterogenous effects: Y_i = α + βx_i + δD_i + ε_i + θD_i x_i (the interaction term lets the function differ on both sides at the cutoff)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the potential outcomes framework for RDD?

A

The thing with limit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two types of RDDs?

A

Sharp RDD: Probability of treatment changes from 0 to 1 at the cutoff (deterministic). No common support (relies on extrapolation).

Fuzzy RDD: Gradual increase in probability of treatment. With a minor jump at the cutoff.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens with the estimator in a fuzzy RDD?

A

It’s scaled to the probability of being treated.

Wald estimator (special case of IV estimator - binary outcome and some degree of non-compliance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bandwidth

A

Narrow: Loss of statistical power
Broad: Risk of specification bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the main challenge to RDD?

A

Sorting.

If the cutoff is known then self-selection into treatment or control is possible. The continuity assumption doesn’t hold up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are reasons for sorting?

A

The assignment rule is known in advance

Agents are interested in adjusting and they have time to adjust

The cutoff is endogenous to factors that affect the potential outcomes

Non-random heaping along the running variable

17
Q

How can possible sorting be inspected?

A

Covariate balance test

McCracy’s density test (power intensive - treat bin frequencies as dependent variable and running variable as independent for each group)

18
Q

What is the difference between matching and RDD?

A

Matching relies on confounders. RDD handles both observed and unobserved confounders.

Matching estimates ATE while RDD estimates LATE.

19
Q

What is extrapolation?

A

Lack of common support. We compared units with different values on the running variable.

20
Q

What are important considerations for specification of the function? Draw different examples.

A

Data could be linear or non-linear. This could lead to and effect being estimated that is due to misspecification and vice versa.

21
Q

What is the difference between parametric and nonparametric specification?

A

Parametric: Specify the functional form before hand.

Nonparametric: Data driven without prior assumptions on the functional form (local E[Y] on the running variable - could be both quadratic, linear and lowess)

22
Q

How do you avoid overfitting?

A

Start with a linear model and try a polynomium (allow one turning point).

Gelman & Imbens (2019) have shown that adding more polynomials introduce bias

23
Q

What an example of a nonparametric approach?

A

Kernel regression (weighting observations closer to cutoff higher - you sort of phase in the bandwidth)

24
Q

Should you or should you not let the functions differ for the control and treatment group?

A

Always let them differ according to Lee & Lemieux (2010)

25
Q

Should you cluster on the running variable?

A

No never. Use honest confidence intervals or robust standard errors.

26
Q

How does the continuity assumption slightly change in fuzzy RDDs?

A

The conditional expectation of the potential outcomes is changing smoothly through the cutoff

27
Q

What do you estimate with fuzzy RDD?

A

LATE for compliers (those whose treatment status changed right around the cutoff)

28
Q

What is nonrandom heaping on the running variable? And what is the solution?

A

Individuals disproportionately report certain values of a variable, often due to convenience, cognitive biases, or intentional behavior. E.g. clustering specific values/rounding like babies weight.

Solution: Donut-hole RDD.

29
Q

What is a popular design in RDD?

A

The close election: This “at the margins of a close race” is crucial because the idea is that it is at the margins of a close race that the distribution of voter preferences is the same.

30
Q

What is a regression kink design?

A

The linear trends flatten out after the cutoff (jump in the first derivative) e.g. government benefits at a threshold.