Week 10 Kuracloud: Statistical Inference Flashcards
Inferential statistics
=
= the methods we use to infer the results from a study sample to the wider population.
Population/Target population
=
= group from which sample is drawn
Sample
=
= people who take part in study (participants)
For the sample to be representative of the population:
each person in the population of interest should have an equal chance of being selected in the sample
Sampling Frame
=
= list of all people/objects in target population
Sampling distribution
=, 3
= frequency distribution of all means of samples of a population
- approximately normal distribution
- mean of sampling distribution is same as mean of population
- SD of sampling distribution = standard error of the mean
Standard error
=
= variations in means from multiple sets of measurements, SD of sampling distribution
- can be estimated from single set of measurements:
SE = (standard deviation)/sqrt(n)
(standard deviation divided by square root of sample size)
Confidence Interval
=, 2
= range in which we are pretty sure the population parameter (e.g. means, medians, difference of means and differences of medians) lies
Depends on:
- variation within population (proprtional to confidence interval)
- size of sample (inversely proportional to confidence interval)
Methods for calculating confidence intervals
3
- informal
- traditional normal-based formulas: stated level of confidence affects confidence interval length
- bootstrapping
95% confidence interval (95%CI)
=,
= 1.96 SE above and below sample mean
because normal distribution of sample means –> 95% of sample means are within 1.96 SE of distribution
95% confidence interval for sample proportions
=,
= sample proportion +/- 1.96 x SE
calculated from SE of a sample proportion
Comparing population means or proportions
2
e.g. means cancer in smoke vs nonsmoke in population
- estimate differences in population means or proportions using sample means or proportions
- calculate 95% confidence interval of measure of difference using standard error of differences in means or proportions
95% confidence interval = mean difference +/- (1.96 x SE)
Methods of comparing 2 proportions
4,
- absolute difference in 2 proportions
- risk ratio
- odds ratio
- prevalence ratio
if SE known, 95% confidence interval can be calculated for each
null hypothesis (H0)
=
= no relationship between exposure and outcome
Alternative hypothesis (HA)
=
= not null hypothesis
p-value
=
= probability of getting observed result if null hypothesis is true (chance that observed estimate is result of sample variation)
p > 0.05: not significant: no/weak evidence against H0
p < 0.05: significant: evidence against H0 (justify rejection)
P < 0.01: highly significant: strong evidence against H0
P < 0.001: very highly significant: very strong evidence
Statistical test for numerical outcome variables
4
- 2 groups, paired observations: paired t-test (Wilcoxob signed rank test)
- 2 independant groups: two-sample t-test, linear regression (Mann Whitney test)
- > 2 groups: ANOVA, linear regression (Kruskall Wallis test)
- numerical exposure variable: Pearson correlation, linear regression (Kendall’s rank correlation, Spearmans correlation)
Statistical tests for binary outcome variables
3
- 2 exposure groups: chi-squared test, logistic regression
- > 2 exposure groups: chi-squared test, logistic regression
- > 2 ordered exposure groups: chi-squared test, logistic regression
If the 95% CI does not contain the null value:
p<0.05
Value of no difference
- comparing means: 0
- comparing proportions using ratio: 1
Sample estimate/effect size
=
= difference between 2 groups
- difference of means
- difference of proportions
- ratio of proportions or odds