non-parametric tests Flashcards
parametric tests
assume a specific distribution with parametric methods
has parameters
eg. normal with mew and sigma as parameters
non-parametric
minimal assumptions
use properties of any distribution like the mean
based on ranking data
means and medians are more robust than t-tests
if normal, t tests are more powerful
assumes continuous density (no tied ranks)
paired data: wllcoxon ranks test
- paired data has a one-to one correspondence
- measurements are dependent on each other in a pair eg. before and after tests in an experiment
hypothesis in wilcoxon ranks
H0: differences between pairs have a symmetric distribution about 0
Ha: differences between pairs do not have a symmetric distribution about 0
test procedure wilcoxon ranks
- match pairs
- work out the difference as Di
- differences are ranked smallest to largest and keep track of +/- signs
- where ties occur, average ranks
work out W+ and W-
W+ + W- = 1/2n(n+1)
and then use equations to calculate p value, assuming normal distribution
wilcoxon zero differences
for continuous density - no exactly zero
ordinal data might have exact zeros and remove offending pairs
Two-sample tests: Mann Whitney-Wilcoxon test characteristics
two independent (unpaired) samples samples might not be the same size
Two-sample tests: Mann Whitney-Wilcoxon test Hypotheses
H0: the two popuations have the same distribution or the same median etc.
two sided or one sided
Two-sample tests: Mann Whitney-Wilcoxon test basic idea
- combine data in a pooled sample
- look at X’s position relative to Y
- XXXYYY suggests Y is on the right XYXYXY suggests no real difference
Test procedure Two-sample tests: Mann Whitney-Wilcoxon test
- order pooled samples smallest to largest
- Wx and Wy are calculated from the ranks
- very small values of Wx or wy would suggest H0 isn’t true
Two-sample tests: Mann Whitney-Wilcoxon test distribution theory
under H0, if m is greater or equal to 10 and so is n, Wx and Wy are normal
use equations to work our test statistic
for smaller samples, compute the rank sum for smaller samples only