exam 3 Flashcards
Quadratic population regression model
test score = p0+p1income+p2income^2+u
Note that we can test if the linear specification is true against the
alternative that the quadratic specification is true by testing
H0 : β2=0 vs β2 doesn’t equal 0
Relationship b/w Y&X is nonlinear
- Effect on Y of a change in X depends on X (marginal effect of X isn’t constant)
- linear regression is mis-specified: the functional form is wrong
- estimator of the effect on Y of X is bias
- the solution is to estimate a regression function that is nonlinear in X
Internal validity
the statistical inferences about casual effects are valid for the population being study
External validity
statistical inferences can be generalized from population + setting studied to other population + setting
setting= legal, policy, physical environment
Threats to external validity
Assessing threats to external validity requires detailed substantive knowledge and judgment on a case-by-case basis
How far can we generalize class size results from California?
– Differences in populations
*California in 2011?
* Massachusetts in 2011?
* Mexico in 2011?
– Differences in settings
* different legal requirements (e.g. special education)
* different treatment of bilingual education
– differences in teacher characteristics
Internal validity Threats
SOWES
Sample selection bias
Omitted variable bias
Wrong functional form
Error in variable buas
Simultaneous causality bias
All of these imply that E(ui|X1i,…,X ki) ≠ 0 (or that conditional mean independence fails)
meaning OLS is biased and inconsistent.
Omitted Variable Bias arises if
1. determinant of Y
2. correlated with at least one included regressor
A control variable W correlated with, and
controls for, an omitted causal factor in the regression of Y
on X, but which itself does not necessarily have a causal effect
- If the multiple regression includes control variables,
- there are omitted factors that are not
adequately controlled for
-whether the error term is
correlated with the variable of interest even after we have
included the control variables.
What are solutions to omitted variable bias?
- Include omitted causal variable as another regressor
- have data on one + controls and they’re adequate, then include control variables
- use panel data, each entity is observed more than once
- if omitted variable can’t be measured, use instrumental variable regression
- replace dependent variable correlated w/ error with other that’s not correlated with error - run randomized controlled experiment
if X is randomly assigned, then X necessarily will be distributed independently of u; thus E(u|X = x) = 0.
Wrong functional form
- if functional form is incorrect
ex: an interaction term is incorrectly omitted;
then inferences on causal effects will be biased.
Solution
1. Continuous dependent variable: use the “appropriate”
nonlinear specifications in X (logarithms, interactions,
etc.)
2. Discrete (example: binary) dependent variable: need an
extension of multiple regression methods (“probit” or
“logit” analysis for binary dependent variables).
Errors in variable bias
So far we have assumed that X is measured without
error.
In reality, economic data often have measurement
error
Lessons in classical measurement error
- The amount of bias in beta hat depends on the nature of the measurement error
- If there is pure noise added to Xi, then beta hat is biased towards 0
- The potential importance of measurement error bias depends
on how the data are collected.
– administrative data (e.g. # teachers in a school) are often quite accurate.
– Survey data on sensitive questions (how much do you earn?)
often have considerable measurement error
Solutions to errors in variable bias
- Obtain better data
- Develop a specific model of measurement error process
- instrumental variables regression
Missing data + sample selection bias
- Data are missing at random.
- Data are missing based on the value of one or more X’s
- Data are missing based in part on the value of Y or u
Cases 1 and 2 don’t introduce bias: the SE are larger than they would be if the data weren’t missing but is ˆβ
unbiased.
Case 3 introduces “sample selection” bias.
Case 1: data are missing at random
Suppose you took a simple random sample of 100 workers, dog ate 20 of the response sheets (selected
at random) before you could enter them into the computer
- This is equivalent to your having taken
a simple random sample of 80 workers , so your dog didn’t introduce any bias
Case 2 Data are missing based on a value of one of the X’s
restrict your analysis to the subset of school districts with STR < 20.
By only considering districts with small class sizes you won’t be able to say anything about districts with large class sizes, but focusing on just the small-class districts doesn’t
introduce bias.
This is equivalent to having missing data,
where the data are missing if STR > 20. More generally, if data are missing based only on values of X’s, the fact that
data are missing doesn’t bias the OLS estimator.
CASE THREE data is missing based in part on the value of Y or u
In general this type of missing data does introduce bias into the OLS estimator.
- called sample selection bias.
Sample selection bias arises when a selection process:
- influences the availability of data and
- is related to the DV
Simultaneous causality bias
X causes Y and Y causes X
- large u means large y, meaning large X
- corr(u,x) doesn’t = 0
- beta hat is biased and inconsistent
Solution to simultaneous causality bias
- Run a randomized controlled experiment. Because Xi is
chosen at random by the experimenter, there is no
feedback from the outcome variable to Y i (assuming perfect
compliance). - Develop and estimate a complete model of both directions
of causality. This is the idea behind many large macro
models (e.g. Federal Reserve Bank-US). This is extremely
difficult in practice. - Use instrumental variables regression to estimate the
causal effect of interest (effect of X on Y, ignoring effect of
Y on X).
Internal and External Validity
- Forecasting and estimation of causal effects are
quite different objectives. - For forecasting,
– matters (a lot!)
– Omitted variable bias isn’t a problem!
– Interpreting coefficients in forecasting models is not
important – the important thing is a good fit and a model
you can “trust” to work in your application
– External validity is paramount: the model estimated
using historical data must hold into the (near) future
– More on forecasting when we take up time series data
R2
Simultaneous causality bias
Large u means large Y, which implies large X
- corr doesn’t equal 0
- beta hat is bias and inconsistent
Solutions to simultaneous causality bias
- Run a randomized controlled experiment. Because Xi is
chosen at random by the experimenter, there is no feedback from the outcome variable to Y i (assuming perfect
compliance). - Develop and estimate a complete model of both directions
of causality. This is the idea behind many large macro models (e.g. Federal Reserve Bank-US). This is extremely
difficult in practice. - Use instrumental variables regression to estimate the causal effect of interest (effect of X on Y, ignoring effect of Y on X).
Internal and External Validity
- forecasting and estimation of causal effects are different
- R^2 adjusted matters
- omitted variable bias isn’t a problem
- external validity is hella important
Omitted Variable Bias
- including control variables, is the error term uncorrelated with STR
Some evidence that the control variables might be doing their job:
– The STR coefficient doesn’t change much when the control
variables specifications change
– The results for California and Massachusetts are similar – so if
there is OV bias remaining, that OV bias would need to be
similar in the two data sets
Why study experiments?
- ideal randomized controlled experiments provide a conceptual benchmark for assessing observation
- actual experiments are rare but influential
- Experiments can overcome the threats to internal validity of observational studies, however they have their own threats to internal and external
validity.
* Thinking about experiments helps us to understand quasi-experiments, or “natural experiments,” in “natural” variation induces “as if” random assignment.
- An experiment is designed and implemented consciously by
human researchers. An experiment randomly assigns subjects to treatment and control groups (think of clinical
drug trials) - A quasi-experiment or natural experiment has a source
of randomization that is “as if” randomly assigned, but this variation was not the result of an explicit randomized treatment and control design. - Program evaluation aimed at evaluating the effect of a program or policy
ex: ad campaign to cut smoking, or a job training program.
A treatment has a causal effect for a given individual
Potential Outcome
outcome for an individual under a potential treatment or potential non-treatment
Average treatment effect
the population mean value of the
individual treatment effects
Wi= control variables
- If X is randomly assigned then Xi is uncorrelated with control so there will not be an omitted variable bias if W is removed. W included helps with smaller SE and reducing error variance
- If probability of assignment depends on W so that X is randomly assigned given W then omitting W can lead to OV bias. Including it eliminates OV bias
Threats to Internal Validity
FAFE
threats show corr(x,u) doesn’t = 0
so OLS is bias
- FAILURE to randomize
- ATTRIBUTION- some subjects drop out
- FAILURE to follow treatment protocol
4.Experimental effects
- experimenter or subject bias
Threats to External Validity
- Nonrepresentative sample
- Nonrepresentative “treatment” (that is,
program or policy) - General equilibrium effects (effect of a
program can depend on its scale;
admissions counseling
Quasi/Natural Experiment
source of randomization that is “as if” randomly assigned
2 Types of Quasi Experiment
- Treatment X “as if” randomly assigned
- Variable (Z) which influences receipt of
treatment (X) is “as if” randomly assigned
Potential Problems w/ Quasi Experiment
Threats to internal validity
1. Failure to randomize
2. Attribute
3. Failure to follow treatment
4. Experimental effects
5. Instrument invalidity- relevance and exogeneity
Threats to external validity of a quasi experiment
- Nonrepresentative sample
- Nonrepresentative treatment
Ideal experiments and potential outcomes
* The average treatment effect is the population mean
of the individual treatment effect, which is the
difference in potential outcomes when treated and
not treated.
* The treatment effect estimated in an ideal
randomized controlled experiment is unbiased for
the average treatment effect.
- The average treatment effect is the population mean of the individual treatment effect = difference in potential outcomes when treated and not treated.
- The treatment effect estimated in an ideal randomized controlled experiment is unbiased for the average treatment effect.
Actual experiment
- have threats to internal
validity - Depending on the threat to internal validity. can be addressed by:
– panel data regression (differences-in-differences)
– multiple regression (including control variables), and
– IV (using initial assignment as an instrument, possibly
with control variables) - External validity also can be an important threat to the validity of experiments
Quasi Experiment
- have threats to internal validity
have an “as-if” randomly
assigned source of variation.
* generate:
– Xi which plausibly satisfies E(u i|Xi) = 0 (so estimation
proceeds using OLS); or
– instrumental variable(s) which plausibly satisfy E(u i|Zi) =
0 (so estimation proceeds using TSLS)