assessing studies based on multiple regression Flashcards
what is internal validity?
the statistical inferences about causal effects are valid for the population being studied
what is external validity?
the statistical inferences can be generalized from the population and setting studied toother populations and settings, where the “setting” refers to the legal, policy, and physical environment and related salient features.
what does assessing threats to external validity require?
it requires detailed substantiated knowledge and judgement on a case by case basis
what are the 5 threats to internal validity of regression studies?
ommitted variable bias
wrong functional form
errors in variables bias
sample selection bias
simultaneous causality bias
what do all the 5 threats to internal validity imply?
it implies that the expected value of the error term given all values of X is not equal to 0 or that the conditional mean independence fails in which case the OLS is biased and inconsistent
what are the 5 solutions to ommitted variable bias?
1) if Omitted causal variable can be measured, include it as regressor
2) if you have data on one or more controls and they are adequate (ie conditional mean independence plausibly holds), then include the control variable
3) use pananl data in which each entity is observed more than once
4) if the ommitted variable cannot be measured , use instrumental variables regression
5) run a randomised controlled experiment
what is a bad control?
variables that are themselves outcome variables in the notional experiment at hand ie what would happen to your wages if you get a college degree ( the coefficient of college wont have a causal interpretation due to causal interpretation even if college degree were randomly assigned
what is wrong functional form?
arises if the functional form is incorrect??
what are the solutions to function form misspecification?
1) continuous dependent variables- use the appropiate non linear specificiations in X
2) discrete dependent variables: need an extension of multiple regression methods
what is errors in variable bias?
economic data can often have measurement errors
ie data entry errors in administrative data, recollection errors in surveys, ambigous questions
this leads to correlation between error term and measured variable
what are the solutions to errors in variable bais?
obtain better data
develop a specific model of the measurement error process. this is only possible if a lot is known about the nature of the measurement error
instrumental variables regression
what is the missing data and sample selction bais?
data often missing, sometimes this data introduces bias
what are the three cases of missing data?
data are missing at random
data are missing based on the value on the value of one or more X’s
data are missing based in part on the value of Y or u
what cases of missing data dont introduce bias and why?
when data is missing at random or when data are missing based on the value of one or more X’s. these dont introduce bias because the standard errors are larger than they would be if the data werent missing
what case of missing data does cause bias?
wehn data are missing based in part on the value of Y or u, this bias is called sample selection bias
when does sample selection bias arise during a selection process?
when the selection process influences the availaiblity of data and is related to the dependent variable
what is survivorship bias?
survivorship bias is a special form of sample selection bias, it is when the only ones sampled are those which have survived an event such as managed firms being measured against hold the market funds. only the surviving managed that have outperformed in the past has survived
what are the solutions to the sample selection bias?
collect the sample ina way that avoids sample selection:
- obtain a true random sample, measure at the beginning of the period
what is simultaneous causality bias?
if X causes Y and Y causes X too then a large ui will mean a larger Y which in turn means a larger X
what are the solutions to simulataneous causality bias?
run a randomised controlled experiment- if X is chosen at random then there is no feedback from the outcome variable to Y
develop and estimate a complete model of both directions of causality- extremely difficult in practise
use instrumental variables regression to estimate the casual effect of interest
what is the external validity requirement for a prediction model?
the data used to estimate the prediction model must be from the same distribution as the out of sample observation for which the prediction is made