Statistical Tests Flashcards
What is a hypothesis?
- allows you to state your idea in a specific testable form
- used to describe a working theory about the data sets you’re considering
Define ‘null hypothesis’
- a working assumption that there’s no difference between the data sets you wish to compare
- i.e. there is no difference/relationship/association
Define ‘alternative hypothesis’
- assumption that there is a difference between data sets
- i.e. there is a difference/relationship/association
Define ‘significance’
- a measure of likelihood that the NULL hypothesis is correct
Define ‘two-sided hypothesis’
- states the difference could be in either direction
- null = NO difference between methods A and B
- alternative = A difference between methods and B
Define ‘one-sided hypothesis’
- states a difference in a specific direction
- alternative = test results using method A are HIGHER/LOWER than those using method B
- null = test results using method A are NOT HIGHER/LOWER than those using method B
What is a ‘type I error’?
- false positive (reject the NH when it is true)
What is a ‘type II error’?
- false negative (accept the NH when it is false)
What is the type II error rate (b)?
- the probability of incorrectly retaining a false null hypothesis
What is meant by ‘power’?
- the probability of correctly rejecting a false null hypothesis
- power = 1 - b (type II error rate)
How can we reduce the chance of type I error?
- choose a lower probability/higher significance
- e.g. P = 0.01
What is the issue of choosing a higher significance?
- critical value of test statistic increases
- thus probability of type II error increases
What are the significance levels (P-values) and what they mean?
- P > 0.05 = not significant
- P < or = 0.05 = significant
- P < or = 0.01 = highly significant
- P < or = 0.001 = very highly significant
What is a parametric test?
- makes particular assumptions about mathematical nature of population distribution from which the samples were taken
- better able to distinguish between true and marginal differences between sample (have greater ‘power’)
What is a non-parametric test?
- doesn’t assume that data fit a particular pattern, but may assume some characteristics of their distributions
What is ‘effect size’ (ES)?
- measures the strength of the result is solely magnitude-based
- it does not depend on sample size
What is meant by ‘A priori power analysis’?
- done in planning phase to determine N (study size)
- necessary to justify project resources in funding processes
- minimise use of animals or risk to patients in clinical research
- involves estimating the sample size required for a study based on predetermined maximum tolerable Type I and II error rates and the minimum effect size that would be clinically, practically, or theoretically meaningful
What is meant by ‘Post-hoc power analysis’?
- done on completion of study to determine Observed Power
- Necessary to check that your expected and measured ES align well
- i.e. Did you have sufficient subjects to detect differences reliably?
What is meant by ‘Sensitivity power analysis’?
- done in planning phase when the sample size is predetermined by study constraints e.g. if there are only 20 subjects available in a pilot study
- instead we determine what level of effect we might be able to find, referred to as the minimal detectable effect (MDE) or minimum clinically important difference (MCID)
How does sample size affect the effect size?
- as sample size increases, the effect size decreases
What is the relationship sample size and power?
- increase in sample size increases power
- but NOT a linear relationship
When would you use a t-test?
- comparing means from TWO independent samples
What other situations would you use a t-test?
- comparing means of paired data
- comparing a sample mean with a chosen value
When would you use ANOVA?
- comparing means from TWO OR MORE samples
Similarity between t-test and ANOVA
- both assume data has a Gaussian distribution and variances of the samples are homogeneous
What are the 2 chief non-parametric tests for comparing locations of 2 samples?
- Mann-Whitney U-test
- Kolmogorov-Smirnov test
What does the Mann-Whitney U-test assume and what sample’s size should it be?
- assumes that the frequency distributions of samples are similar
- sample’s size must more or equal to 4
What sample’s size must Kolmogorov-Smirnov test have?
- more or equal to 4
- samples must have equal sizes
Significant differences found with the Kolmogorov-Smirnov test could be due to what?
- differences in location
- or shape of distribution
- or both
What are the 2 suitable non-parametric comparisons of location of paired data?
- Wilcoxon’s signed rank test
- Dixon and Mood’s sign test
What is the Wilcoxon’s signed rank test?
- used for quantitative data
- assumes that distributions have similar shape
- sample size equal to or more than 6
What is the Dixon and Mood’s sign test used for?
- paired data scores where one variable is recorded as ‘greater than/better than’ the other
- sample size equal to or more than 6
What are the 2 suitable non-parametric comparisons of location for 3 or more samples?
- Kruskal-Wallis H-test
- Friedman S-test
What is the Kruskal-Wallis H-test?
- number of samples is without limit and can be unequal in size
- underlying distributions are assumed to be similar