STATS Flashcards
omitted variable bias
omitting an important variable that impacts the dependent variable that is correlated with an existing independent variable
if corr(x2,x3) > 0, β2 is upwards biased (if you drop one, the other will increase in magnitude)
if corr(x2,x3) < 0, β2 is downwards biased (if you drop one, the other will decrease in magnitude because it is absorbing the negative correlation)
reverse causality
when we do not know in which direction the causal relationship is (e.g. murder rate and unemployment)
solution: Granger’s causality
check to see if the previous year’s unemployment rate had an effect on today’s murder rate (because today’s murder rate cannot have an effect on last years unemployment rate
murder ratet = β0 + β1(unemployment ratet-1) + β2(murder ratet-1) + εt
measurement error
random error:
- error that is brought about by different factors from measurement to measurement
- SE of the slope is larger, the coefficient is less significant, therefore harder to reject the null hypothesis
systematic error
- error that is constant throughout the experiment (e.g. older people tend to be more risk averse, women tend to underestimate)
- makes the coefficient biased
solution: draw as random a sample as possible, including many people from different groups
simultaneity
when there is a bidirectional causal relationship between 2 variables
multicollinearity
when there are multiple independent variables that are highly correlated to each other in the regression model
this makes it difficult to distinguish the true individual effects of each variable
look for High F-statistic and low t-statistics in the ANOVA table
sampling bias
occurs when the selected sample is not representative of the larger population, leading to inaccurate or misleading conclusions
simple random sampling
- create a list of all members of the population
- generate n random numbers and include these members into the experiment
problem: it is often difficult or impossible to gain a list of all members of a population
stratified random sampling
- find the population proportion for each strata
- sample subjects from each strata such that the strata in the sample and the strata in the population have the same population
cluster sampling
random sampling of groups of population members (typically a geographic group, in which all members of the cluster sample will be sampled)
- can be used when it is difficult to obtain a complete list of the population
- is cheaper
- increases sampling error, as members within the same geographic area are more likely to think the same
sampling errors
occurs when there is a difference between the population statistic and the sample statistic
non-sampling errors
non-response bias: when individuals who do not respond to a survey are from a particular group
under-coverage: occurs when some groups in a population are underrepresented
wording effect: confusing or leading questions can lead to skewed data
these can all be avoided by random sampling (ensuring every member of the population has an equal chance of being selected) or stratified sampling (dividing the population into subgroups and getting random samples from each subgroup; this ensures subgroups are adequately represented)