Lecture 20 Flashcards

Question 1

Q

Internal vs external validity

Answer

A

Internal - does the design identify a causal effect?
External - can we generalise this effect to other settings?

Question 2

Q

Internal validity

Answer

A

Is the causal effect you’re estimating actually valid for this sample, threats:
1. OVB
2. Incorrect model functional form
3. Measurement error
4. Sample selection issues
5. Simultaneity
Each of these violate exogeneity, so OLS will be biased

Question 3

Q

What Is OVB ?

Answer

A

You get when you leave out a variable which is:
1. A determinant of the outcome variable y
2. Correlated with the regressor of interest x, exogeneity fails

Question 4

Q

How do controls help OVB?

Answer

A

If you include controls wi that soak up the influence of the OV, then the key condition becomes:
- E[ui|xi,wi] = E[ui|wi]
- so you’re good as long as the OV is conditionally uncorrelated

Question 5

Q

OVB in different models

Answer

A

OLS - classical OVB setup, fails if OV affects y and is correlated with x
IV - OV must not be correlated with instrument or instrument is invalid
Panel Data - use fixed effects to control for unobserved heterogeneity, but if OV varies within the fixed effects unit, still get OVB

Question 6

Q

Sign of the OVB

Answer

A

effect of OV on y
correlation between OV and x
Bias = (effect of OV on y) x (correlation with x)

Question 7

Q

Solutions to OVB

Answer

A

Include the OV if observable
Use controls if they proxy well for the OV
Use Panel Data with fixed effect to eliminate unobserved time-invariant factors
Use IVs if the OV can’t be measured, but you still have a valid instrument
Run an experiment - randomisation breaks correlation between regressors and OVs

Question 8

Q

Functional form misspecification

Answer

A

When regression model doesn’t correctly capture the relationship between variables
- coefficients will be biased

Question 9

Q

Solutions to functional form misspecification

Answer

A

Use appropriate nonlinear transformations
Let the data guide you
Use shrinkage methods like ridge/LASSO
Model binary/ censored data correctly, so logit/ probit vs Tobit
Accept some uncertainty

Question 10

Q

Errors-in-variables bias
- measurement error in regressors can lead to bias

Answer

A

Causes attenuation bias, estimated slope is biased towards zero
- classical measurement error which causes bias

Question 11

Q

Best-guess measurement error

Answer

A

A special case of measurement error where bias does not occur
- imagine a person doesn’t remember their income xi, but they make a best guess based on their education wi
- xi^ = E[xi|wi], then under two key assumptions:
Cov(xi^, xi - xi^) = 0
Cov(xi^,ui) = 0

Leads to cov(xi^,ui^) = 0, so coefficient is unbiased

Question 12

Q

Solutions to measurement error

Answer

A

Get better data
Model the measurement error process, if you know something about the source of error
Use IVs, good if instruments are valid
Errors in y? If uncorrelated with x, they do not bias B1^, only increase variance

Question 13

Q

Sample selection bias

Answer

A

3 types of missing data:
- missing at random - no bias
- missing based on x - no bias in a linear model, variation in x is reduced, which increases SEs and may reduce external validity
- missing based on y or u, does cause bias - actual sample selection bias

Question 14

Q

Benign missing data

Answer

A

Data missing at random, if you took a random sample of 100 students, but lost 20 randomly in the wind, equivalent to taking a random sample of 80 students, so no bias
Data are missing based on a value of one of the x’s, suppose you are studying the effect of the student-teacher ratio on test scores, but restrict your attention to school districts with STR < 20, in a linear model, focusing on a subset does not cause bias, reduces the variation in x, etc

Question 15

Q

Data missing on y or u

Answer

A

When selection into the sample depends on past outcomes or unobserved factors, OLS estimates become biased

Question 16

Q

Truncation and incidental truncation

Answer

A

Truncation: occurs when data is only included if it meets a cutoff yi < ci, solve by writing down the likelihood conditional on selection and estimate via MLE

Incidental Truncation: only observe y for a non-random subset, solve by using Heckman 2 stage model

Question 17

Q

Solutions to sample selection bias

Answer

A

Design better sampling
Run randomised experiments to avoid selection altogether
Model the selection process explicitly

Question 18

Q

Simultaneous causality bias

Answer

A

Usually assume x -> y, but sometimes y also causes x
- means xi becomes correlated with ui, so exogeneity assumption fails and B1^ is biased

Question 19

Q

Solutions to SCB

Answer

A

Run a randomised controlled experiment
Model both directions of causality
Use IVs

Question 20

Q

A not on using IV regression

Answer

A

IV solves 3 major problems:
1. OVB
2. Measurement Error
3. Simultaneous Causality
BUT:
- adds its own challenges, so the instrument must be exogenous and relevant
- if these fail, still biased, can get weak instrument bias which can be worse than OLS

Question 21

Q

Threats to internal validity of experiments

Answer

A

Failure to randomise
Failure to follow treatment protocol
Attrition, people drop out in a way which is related to their potential outcomes
Experimental effects: experimenter bias (researchers treat groups differently), Hawthorne effects (subjects change behaviour just from being studied)

Question 22

Q

Threats to internal validity for quasi experiments

Answer

A

Failure to randomise
Failure to follow treatment protocol
Attrition
Experimental effects - N/A
Instrument invalidity, quasi random instrument must be exogenous and relevant.

Question 23

Q

Two main dimensions of external validity

Answer

A

Different populations
Different settings

More contextual than internal validity, not about statistical assumptions, but rather about how similar other settings are.

Question 24

Q

Threats to external validity in experiments

Answer

A

Non-representative sample, participants in your study might not reflect the broader population
Non-representative treatment, way a treatment is implemented in an experiment may be too artificial or costly for real-world application
General equilibrium effects, effect of a program can change when it’s scaled up

Quasi-experiments often improve external validity because they study real-world programs