assessing studies based on multiple regression Flashcards
what is internal validity?
the statistical inferences about causal effects are valid for the population being studied
what is external validity?
the statistical inferences can be generalized from the population and setting studied toother populations and settings, where the “setting” refers to the legal, policy, and physical environment and related salient features.
what does assessing threats to external validity require?
it requires detailed substantiated knowledge and judgement on a case by case basis
what are the 5 threats to internal validity of regression studies?
ommitted variable bias
wrong functional form
errors in variables bias
sample selection bias
simultaneous causality bias
what do all the 5 threats to internal validity imply?
it implies that the expected value of the error term given all values of X is not equal to 0 or that the conditional mean independence fails in which case the OLS is biased and inconsistent
what are the 5 solutions to ommitted variable bias?
1) if Omitted causal variable can be measured, include it as regressor
2) if you have data on one or more controls and they are adequate (ie conditional mean independence plausibly holds), then include the control variable
3) use pananl data in which each entity is observed more than once
4) if the ommitted variable cannot be measured , use instrumental variables regression
5) run a randomised controlled experiment
what is a bad control?
variables that are themselves outcome variables in the notional experiment at hand ie what would happen to your wages if you get a college degree ( the coefficient of college wont have a causal interpretation due to causal interpretation even if college degree were randomly assigned
what is wrong functional form?
arises if the functional form is incorrect??
what are the solutions to function form misspecification?
1) continuous dependent variables- use the appropiate non linear specificiations in X
2) discrete dependent variables: need an extension of multiple regression methods
what is errors in variable bias?
economic data can often have measurement errors
ie data entry errors in administrative data, recollection errors in surveys, ambigous questions
this leads to correlation between error term and measured variable
what are the solutions to errors in variable bais?
obtain better data
develop a specific model of the measurement error process. this is only possible if a lot is known about the nature of the measurement error
instrumental variables regression
what is the missing data and sample selction bais?
data often missing, sometimes this data introduces bias
what are the three cases of missing data?
data are missing at random
data are missing based on the value on the value of one or more X’s
data are missing based in part on the value of Y or u
what cases of missing data dont introduce bias and why?
when data is missing at random or when data are missing based on the value of one or more X’s. these dont introduce bias because the standard errors are larger than they would be if the data werent missing
what case of missing data does cause bias?
wehn data are missing based in part on the value of Y or u, this bias is called sample selection bias