Statistics Flashcards
Numerical data types
Discrete - whole numbers only
Continuous - any value, scale - weight, height, length
What % of population for 1SD?
68%
What % of population for 2 SD?
95%
What % of population for 3 SD?
99.7%
Parametric data
- criteria?
- Advantages
Continuous numerical data
Population has normal distribution
Population and sample have same variance and SD
Parametric assessment has better power
Features of non-parametric tests
Emphasis on rank
Doesn’t require specific distribution
Less power
Central Limit theorem
For a skewed population, if N > 30, then can assume distribution will be normal
2 groups
Unpaired
Parametric
What tests?
Equal variance - student t test
Not equal variance - Welch test
2 groups
Unpaired
Non-parametric
Mann Whitney U test
2 groups
Paired
Parametric
Paired t test
2 groups
Paired
Non-parametric
Wilcoxon signed rank test
3 or more groups
Unpaired
Parametric
One way ANOVA
3 or more groups
Unpaired
Non-parametric
Kruksal Wallis test
3 or more groups
Paired
Parametric
One way repeated measured ANOVA
3 or more groups
Paired
Non-parametric
Friedman test
Test association between 2 qualitative variables
N > 50
Chi Squared test
Test association between 2 qualitative variables
N < 50
Fischer Exact Test
Test linear relationship between 2 variables
Parametric
Pearson’s correlation
Test linear relationship between 2 variables
Non-parametric
Spearman’s rank correlation
Difference between test statistic and p value
Test statistic - standardised value used for hypothesis testing
p value - probability that test statistic is random = type 1 error probability
When to use Z statistic
Known population mean + SD
Sample size > 30
Z = z score. Need population mean and sd
When to use t statistic
Popualtion mean and SD unknown
When to use F statistic
ANOVA
Statistical Power
- What is it
Probability study will detect predetermined difference between 2 groups
= probability will correctly accept alternative hypothesis
1- power = chance of false negative = probability of type 2 error
Changes that will increase power
Increase sample size
Increase significance level (0.05 - 0.1)
Increase detected difference
Reduce SD
Deciding significance level
If consequences of type 1 errror are serious - use small significance level, reducetype 1 error
If consequences of false negative are high, use higher significance level, increase power, reduce type 2 error chance
Drawback of post-hoc analysis
Type 1 error chance increases (selectively looking for positives, multiple error each time)
Drawback of trying to mitigate type 1 error risk in post-hoc analysis
Make total significance level smaller
Increase requirement for power - if N not increased, then type 2 error increases
Event rate is also called?
Absolute risk
NNT formula
1/ARR
i.e 1 divided by absolute risk reduction