Causal analysis Flashcards
When does an OLS estimator capture a causal relationship? When does it not and what type of relationship is this instead?
An OLS estimator captures a causal relationship when the exogeneity assumptions holds, which states that the independent variable/regressor is uncorrelated with the error terms.
If the exogeneity assumption is violated then we are only able to interpret a relationship as an association or correlation.
Define an exogenous regressor and write this out mathematically.
An exogenous regressor is one which which is not correlated with the error terms i.e. Xi and ui are not correlated.
E(ui | Xi) = 0. Alternatively, exogeneity can be written as Cov(Xi, ui) = 0*
Define an endogenous regressor.
An endogenous regressor is one where Xi and ui are correlated. For example, if higher values of Xi are systematically associated with higher values of ui.
What are four possible reasons for endogeneity?
Reasons for endogeneity:
- Omitted variable bias
- Measurement error
- Simultaneity bias
- Selection bias
Define omitted variable bias.
Give an example of when omitted variable bias may occur.
Omitted variable bias is where a relevant variable is left out of the regression model which effects the dependent variable and is correlated with one of the regressors (is second part of this necessary?*)
For example, regressing wages on education may lead to endogeneity due to omitted variable bias. Ability is likely to affect wages and is correlated with education. The estimated coefficient for education will therefore be overestimated because ability and education are positively correlated, and the coefficient will therefore partially capture the effect of ability on wages.
Positive correlation because those who have higher ability are more likely to choose to stay on in education.
Define measurement error.
Give an example of when measurement error may occur.
Measurement error is where there is substantial error involved in measuring Xi, the independent variable.
For example, if testing the permanent income hypothesis measurement error may occur if using current income to estimate permanent income.
However, only use this answer if substantial measurement error is necessary.
Define simultaneity bias.
Give an example of simultaneity bias.
Simultaneity bias is where the dependent variable effects the independent variable, as well as the other way around. In other words, we are unsure which variable causes which, or both. They may be determined simultaneously.
For example, supply and demand; or police expenditure and crime rate.
Define selection bias.
What may cause selection bias to occur?
Selection bias is where the sample used in estimating a regression bias is not representative of the population, leading to the estimates being positively or negatively bias.
Selection bias may occur because the observations we observe may be self-selecting or skewed in some way.
The missing observations mean that we are unable to conclude whether it is a causal relationship, and estimates we derive are misleading.
What are three possible solutions to the problem of endogeneity?
- Randomisation
- Instrumental variables
- Regression discontinuity design.
What is randomisation and how does this help to alleviate the problem of endogeneity?
Randomisation is when the variable of interest, Xi, is randomly assigned using, for example, a random number generator, rolling a dice, flipping a coin.
This makes it more likely for regressors to be exogenous because they will not be correlated with unobserved background characteristics i.e. randomisation smooths out disparate background characteristics.
What is the terminology used for randomisation in medical trials or by economists?
If Xi is the randomly assigned binary variable then:
Xi = 1 represents the treatment occurring
Xi = 0 represents the control group
The coefficient of Xi captures the treatment effect, which is the average change in outcome due to the treatment compared to the control group.
What are four problems or limitations that may be associated with randomised experiments?
- Unethical to force participants to participate in some experiments, for example giving health care to some and not others.
- Groups will often be self-selecting and therefore will have certain background characteristics, meaning they are no longer random
- True and complete randomisation is often impossible, for example you cannot randomly assign gender, ethnicity, or having health insurance. The results often reflect having the CHOICE of receiving treatment rather than actual effect of the treatment.
- Experiments are often time-limited and short, making it difficult to find long-term effects.
Define an instrumental variable.
Explain how instrumental variables help with endogeneity problems.
An instrumental variable is a substitute or proxy for a variable which is suspected to be endogenous or stochastic*?.
The instrument helps with endogeneity problems because it allows us to generate exogenous variation in Xi which is uncorrelated with the error terms, ui.
What three assumptions must the instrument(s) satisfy?
An instrument(s) must satisfy three assumptions:
1. Relevance assumption - the Z(s) must be correlated with X after accounting for Wi, the other exogenous variables in the regression
i.e. the relationship must be non-zero.
cov(Xi, Zi) /= 0
2. Exogeneity assumption/exclusion restriction/validity. This says that Z must not be correlated with the error terms, ui.
cov(Zi, ui) = 0
3. Z cannot be another regressor in the model, it must be an additional variable
What method is used to estimate a regression when using instrumental variables? Describe briefly (in a sentence) how to carry out this method.
Two Stage Least Squares (2SLS) is used to estimate a regression when using instrumental variables. This involves two consecutive OLS regressions.