Week 5 - Foundations for inference Flashcards
sampling variability
our point estimate isn’t exactly equal to p
bias
we may systematically under or over estimate the true value
central limit theorem
as the sample size grows, the distribution of the sample proportion can be approximated by a normal distribution
hypothesis testing
another way of conducting statistical inference - a formal way of pitting 2 mutually exclusive statements against each other
2 hypothesis statements
Null hypothesis - there is nothing going on
- defendant is innocent
Alternative hypothesis - there is something going on
- defendant is guilty
general rule - null is true until we prove otherwise, but we never accept the null hypothesis
one sided hypothesis test vs two sided
one sided = only checking for an effect in one direction
two sided =
significance level
this defines the threshold p value below which the null hypothesis will be rejected 0.05 is the most common
the null distribution
the distribution of possible outcomes under the null hypothesis
type 1 error
when we reject a null hypothesis that is true
the probability of a type 1 error is denoted by ‘a’
when the null is true 100 x a% of the time you will observe a p value below ‘a’
type 2 error
when we fail to reject a false null hypothesis
the probability of making a type 2 error is denoted by BETA
significance level
the probability that the event could have happened by chance
the probability of rejecting the null when its actually true
margin of error
the range of values around the sample statistic that is likely to contain the true population parameter
chi square goodness of fit test
used to determine if the data in your sample is drawn from some hypothesized distribution. ie does your expected response match the actual response.
chi square considerations
independence
at least 2 degrees of freedom
at least 5 values in each category it think
the t distribution
the t distribution is used when we aren’t given the standard deviation. it accounts for the extra uncertainty of another estimated value.
its very similar to the normal distribution but is more spread out to reflect the uncertainty