24. Statistical Inference Flashcards
statistical inference
- using a sample to make statements about the population
- needed because we can’t measure effect in entire populations
- process of drawing conclusions about effects in a population
- using data on a sample drawn from that population, PATTERNS revealed through analysis of sample data -> generalized to population
why do we use random sampling?
- each member of the population has an equal chance of being chosen
- study sample is representative of population
- provides greatest probability that findings in the sample will closely approximate the overall population
- findings can be “generalized”
what are the three main explanations for any observed effect?
- the effect is due to bias or confounding
- the effect is due to chance (this is where statistics comes in)
- the effect is real
bias vs confounding
bias is a systematic error in the design or implementation of a study: creates an association which is not true
confounding is an association that is true, but potentially misleading
hypothesis testing
involves choosing between 2 propositions:
- null hypothesis : no real difference b/w groups, observed effect is due to chance
- alternate hypothesis: real difference exists between groups
we are looking to “reject” the null hypothesis (we want to show that the observed effect is greater than what we would expect based on chance alone)
a null hypothesis may occur when you find an observed effect in the sample population, but there is no effect on the entire population (that you are trying to represent)
P value
- the probability that the observed effect could be due to chance alone
- probability of obtaining the results if the null hypothesis were true
- P value of 0.05 means that there is only a 5% probability of obtaining observed result if it wasn’t real (but a 95% probability that the observed result is real and representative of the entire population)
- the lower the P value, the stronger the evidence found
P value and significance
- if p-value is less than a certain value (0.05) we conclude that chance alone is unlikely to explain the effect we see
- therefore we REJECT the null hypothesis
- we can call result “statistically significant”
- 0.05 is called the alpha level (so P
P > 0.05
- does NOT mean that there is no difference between the two groups or that the two are equivalent
- it just means that we are unable to rule out that chance explains the effect we observed
confidence intervals
- estimated range of values likely to include an unknown population effect
- the “level” of confidence is the probability that the interval produced by a statistical method includes the true value of the population effect (usu 95%)
- I.e. there is an X% chance that the range will cover the true mean; accounting for variability from sample to sample
null value for difference in means? relative risk? odds ratio?
difference in means = 0
RR = 1
OR = 1
confidence intervals around differences between groups?
- if the confidence interval does NOT contain the NULL value, then we can say with X% confidence that the observed effect is not due to chance alone
- –then the result is statistically significant (P
confidence interval vs P-value?
confidence interval is more informative than a p-value
in addition to statistical significance (given by P value), CI also gives you an idea of how large or how small the effect is likely to be
what are the two types of quantitative variables?
continuous (measurement) and discrete
what are the two types of qualitative variables?
nominal and ordinal
what type of variable is age?
continuous (quantitative)