Causal Inference Flashcards
Why is it difficult to establish causality between variables?
- There could be omitted variables affecting both X and Y.
- Simultaneity might exist, where X affects Y and Y also affects X.
- Measurement errors in variables could distort the estimated relationships.
- There is often an issue of selection bias, where unobserved factors influence both the treatment and the outcome.
What is the main goal of causal inference in empirical research?
The main goal is to identify the causal effects of one or more explanatory variables on a dependent variable, rather than just finding correlations.
What is endogeneity and why does it cause problems in OLS regression?
Endogeneity occurs when an explanatory variable is correlated with the error term, leading to biased and inconsistent estimates in OLS regression.
What are three common causes of endogeneity?
Omitted variables - relevant variables not included in the model
Simultaneity - mutual causation between x and y
Measurement error - when variables are inaccurately measured
What is the purpose of Instrumental Variables estimation?
IV estimation is used to solve the problem of endogeneity by finding an external variable (instrument) that affects the endogenous variable but is uncorrelated with the error term.
What two conditions must an instrument satisfy in IV estimation?
Relevance - correlated with the endogenous variable.
Exogenous - uncorrelated with the error term in the outcome equation.
Explain the Two-Stage-Least-Squares (2SLS) method.
Stage 1: Regress the endogenous variable x on the instrument z to get predicted values of x.
Stage 2: Regress the dependent variable y on the predicted x to estimate the causal effect.
What is the key difference between OLS and IV estimation?
OLS assumes that the explanatory variables are exogenous, while IV is used when one or more explanatory variables are endogenous and allows for consistent estimation by using an external instrument.
What is the Difference-In-Differences (DD) method?
DD is a technique that estimates the causal effect of a treatment by comparing the changes in outcomes over time between a treatment group and a control group. It mimics a randomized experiment using observational data.
What is the Parallel Trends Assumption in DD?
The parallel trends assumption states that, in the absence of treatment, the treatment and control groups would have followed the same trend in the outcome variable over time. This is crucial for DD to produce unbiased estimates.
Why is the parallel trends assumption important but untestable?
It’s important because DD relies on this assumption to establish a valid counterfactual, but it’s untestable because we can never observe what would have happened to the treatment group if they hadn’t received the treatment.
What is Regression Discontinuity Design (RDD) and how does it work?
RDD exploits a cut-off in a continuous variable to determine treatment assignment. It compares individuals just above and just below the threshold, assuming those near the cut-off are similar except for receiving treatment.
What are the two key assumptions for RDD to produce valid causal estimates?
Cut-off assignment - treatment assignment is determined by a known threshold.
Continuity - outcomes for units just above and below the threshold are similar, meaning there are no sharp changes around the cut-off except for treatment assignment.
How does fuzzy RDD differ from sharp RDD?
In sharp RDD, treatment assignment is strictly determined by the threshold, while in fuzzy RDD, not all individuals above the threshold receive treatment, and some below the threshold may receive it.
What is the 2SLS estimator in IV, and what does it measure?
The 2SLS estimator measures the causal effects of the endogenous variable on the dependent variable. It is calculated using the predicted values of the endogenous variable from the first stage and regressing the dependent variable on these predicted values in the second stage.