Causal Inference Flashcards

1
Q

Why is it difficult to establish causality between variables?

A
  1. There could be omitted variables affecting both X and Y.
  2. Simultaneity might exist, where X affects Y and Y also affects X.
  3. Measurement errors in variables could distort the estimated relationships.
  4. There is often an issue of selection bias, where unobserved factors influence both the treatment and the outcome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the main goal of causal inference in empirical research?

A

The main goal is to identify the causal effects of one or more explanatory variables on a dependent variable, rather than just finding correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is endogeneity and why does it cause problems in OLS regression?

A

Endogeneity occurs when an explanatory variable is correlated with the error term, leading to biased and inconsistent estimates in OLS regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are three common causes of endogeneity?

A

Omitted variables - relevant variables not included in the model
Simultaneity - mutual causation between x and y
Measurement error - when variables are inaccurately measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of Instrumental Variables estimation?

A

IV estimation is used to solve the problem of endogeneity by finding an external variable (instrument) that affects the endogenous variable but is uncorrelated with the error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What two conditions must an instrument satisfy in IV estimation?

A

Relevance - correlated with the endogenous variable.
Exogenous - uncorrelated with the error term in the outcome equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the Two-Stage-Least-Squares (2SLS) method.

A

Stage 1: Regress the endogenous variable x on the instrument z to get predicted values of x.
Stage 2: Regress the dependent variable y on the predicted x to estimate the causal effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the key difference between OLS and IV estimation?

A

OLS assumes that the explanatory variables are exogenous, while IV is used when one or more explanatory variables are endogenous and allows for consistent estimation by using an external instrument.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Difference-In-Differences (DD) method?

A

DD is a technique that estimates the causal effect of a treatment by comparing the changes in outcomes over time between a treatment group and a control group. It mimics a randomized experiment using observational data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Parallel Trends Assumption in DD?

A

The parallel trends assumption states that, in the absence of treatment, the treatment and control groups would have followed the same trend in the outcome variable over time. This is crucial for DD to produce unbiased estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is the parallel trends assumption important but untestable?

A

It’s important because DD relies on this assumption to establish a valid counterfactual, but it’s untestable because we can never observe what would have happened to the treatment group if they hadn’t received the treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Regression Discontinuity Design (RDD) and how does it work?

A

RDD exploits a cut-off in a continuous variable to determine treatment assignment. It compares individuals just above and just below the threshold, assuming those near the cut-off are similar except for receiving treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two key assumptions for RDD to produce valid causal estimates?

A

Cut-off assignment - treatment assignment is determined by a known threshold.
Continuity - outcomes for units just above and below the threshold are similar, meaning there are no sharp changes around the cut-off except for treatment assignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does fuzzy RDD differ from sharp RDD?

A

In sharp RDD, treatment assignment is strictly determined by the threshold, while in fuzzy RDD, not all individuals above the threshold receive treatment, and some below the threshold may receive it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the 2SLS estimator in IV, and what does it measure?

A

The 2SLS estimator measures the causal effects of the endogenous variable on the dependent variable. It is calculated using the predicted values of the endogenous variable from the first stage and regressing the dependent variable on these predicted values in the second stage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we compare OLS and IV estimates?

A

Comparing OLS and IV estimates helps to understand the direction and magnitude of the bias in OLS. If OLS and IV estimates differ significantly, it suggests that endogeneity is a significant problem in OLS.