Lecture 20 Flashcards

1
Q

Internal vs external validity

A

Internal - does the design identify a causal effect?
External - can we generalise this effect to other settings?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Internal validity

A

Is the causal effect you’re estimating actually valid for this sample, threats:
1. OVB
2. Incorrect model functional form
3. Measurement error
4. Sample selection issues
5. Simultaneity
Each of these violate exogeneity, so OLS will be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What Is OVB ?

A

You get when you leave out a variable which is:
1. A determinant of the outcome variable y
2. Correlated with the regressor of interest x, exogeneity fails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do controls help OVB?

A

If you include controls wi that soak up the influence of the OV, then the key condition becomes:
- E[ui|xi,wi] = E[ui|wi]
- so you’re good as long as the OV is conditionally uncorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OVB in different models

A

OLS - classical OVB setup, fails if OV affects y and is correlated with x
IV - OV must not be correlated with instrument or instrument is invalid
Panel Data - use fixed effects to control for unobserved heterogeneity, but if OV varies within the fixed effects unit, still get OVB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sign of the OVB

A
  • effect of OV on y
  • correlation between OV and x
    Bias = (effect of OV on y) x (correlation with x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Solutions to OVB

A
  1. Include the OV if observable
  2. Use controls if they proxy well for the OV
  3. Use Panel Data with fixed effect to eliminate unobserved time-invariant factors
  4. Use IVs if the OV can’t be measured, but you still have a valid instrument
  5. Run an experiment - randomisation breaks correlation between regressors and OVs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Functional form misspecification

A

When regression model doesn’t correctly capture the relationship between variables
- coefficients will be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Solutions to functional form misspecification

A
  1. Use appropriate nonlinear transformations
  2. Let the data guide you
  3. Use shrinkage methods like ridge/LASSO
  4. Model binary/ censored data correctly, so logit/ probit vs Tobit
  5. Accept some uncertainty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Errors-in-variables bias
- measurement error in regressors can lead to bias

A

Causes attenuation bias, estimated slope is biased towards zero
- classical measurement error which causes bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best-guess measurement error

A

A special case of measurement error where bias does not occur
- imagine a person doesn’t remember their income xi, but they make a best guess based on their education wi
- xi^ = E[xi|wi], then under two key assumptions:
Cov(xi^, xi - xi^) = 0
Cov(xi^,ui) = 0

Leads to cov(xi^,ui^) = 0, so coefficient is unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Solutions to measurement error

A
  1. Get better data
  2. Model the measurement error process, if you know something about the source of error
  3. Use IVs, good if instruments are valid
  4. Errors in y? If uncorrelated with x, they do not bias B1^, only increase variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample selection bias

A

3 types of missing data:
- missing at random - no bias
- missing based on x - no bias in a linear model, variation in x is reduced, which increases SEs and may reduce external validity
- missing based on y or u, does cause bias - actual sample selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Benign missing data

A
  1. Data missing at random, if you took a random sample of 100 students, but lost 20 randomly in the wind, equivalent to taking a random sample of 80 students, so no bias
  2. Data are missing based on a value of one of the x’s, suppose you are studying the effect of the student-teacher ratio on test scores, but restrict your attention to school districts with STR < 20, in a linear model, focusing on a subset does not cause bias, reduces the variation in x, etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Data missing on y or u

A

When selection into the sample depends on past outcomes or unobserved factors, OLS estimates become biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Truncation and incidental truncation

A

Truncation: occurs when data is only included if it meets a cutoff yi < ci, solve by writing down the likelihood conditional on selection and estimate via MLE

Incidental Truncation: only observe y for a non-random subset, solve by using Heckman 2 stage model

17
Q

Solutions to sample selection bias

A
  1. Design better sampling
  2. Run randomised experiments to avoid selection altogether
  3. Model the selection process explicitly
18
Q

Simultaneous causality bias

A

Usually assume x -> y, but sometimes y also causes x
- means xi becomes correlated with ui, so exogeneity assumption fails and B1^ is biased

19
Q

Solutions to SCB

A
  1. Run a randomised controlled experiment
  2. Model both directions of causality
  3. Use IVs
20
Q

A not on using IV regression

A

IV solves 3 major problems:
1. OVB
2. Measurement Error
3. Simultaneous Causality
BUT:
- adds its own challenges, so the instrument must be exogenous and relevant
- if these fail, still biased, can get weak instrument bias which can be worse than OLS

21
Q

Threats to internal validity of experiments

A
  1. Failure to randomise
  2. Failure to follow treatment protocol
  3. Attrition, people drop out in a way which is related to their potential outcomes
  4. Experimental effects: experimenter bias (researchers treat groups differently), Hawthorne effects (subjects change behaviour just from being studied)
22
Q

Threats to internal validity for quasi experiments

A
  1. Failure to randomise
  2. Failure to follow treatment protocol
  3. Attrition
  4. Experimental effects - N/A
  5. Instrument invalidity, quasi random instrument must be exogenous and relevant.
23
Q

Two main dimensions of external validity

A
  1. Different populations
  2. Different settings

More contextual than internal validity, not about statistical assumptions, but rather about how similar other settings are.

24
Q

Threats to external validity in experiments

A
  1. Non-representative sample, participants in your study might not reflect the broader population
  2. Non-representative treatment, way a treatment is implemented in an experiment may be too artificial or costly for real-world application
  3. General equilibrium effects, effect of a program can change when it’s scaled up

Quasi-experiments often improve external validity because they study real-world programs