Non-Parametric Tests Flashcards
Problems with parametric tests
Strong assumptions e.g. Normality e.g. N large enough to invoke CLT.
What are non-parametric tests?
Tests valid over a wide range of distributions and can be carried out making far fewer assumptions about the random variable.
What is the most simple non parametric test called?
Wilcoxon Sign test
What type of data does the sign test analyse?
Matched pairs
Briefly describe the process of setting up sign test
Assign + if 1st value > 2nd
- if 1st value < 2nd
Construct a Bernoulli trial for each individual
Under H0, p=0.5. Repeated Bernoulli = binomial
W ~ B(n, 0.5). P (W is what we observe)
For sign test how do we calculate the p value if the test is two sided?
We work out probability using binomial e.g. 9C0 (0.5)^9 + 9C1 (0.5)^9 = 0.02
P value for 2 sided test = 2 x 0.02 = 0.04
0.04 < 0.05 (alpha) therefore we reject H0
For a sign test if n>25, how do we work out the probability?
Invoke CLT so W bar is approx normal with M=0.5 and sigma^2 = 0.5^2 / n.
Z = (W/n - 0.5) / (sqrt 0.5^2 / n)
Problem with sign test
Ignores magnitude - treats a large negative difference the same as a small negative difference. Collapse everything to 0 or 1 = lots of information thrown away = low powered test when n is small. More likely to make a type 2 error of accepting H0 when it’s false. So we often find an insignificant test statistic.
What do we do with zero differences in the sign test?
We discard them and then reduced n by 1.
The sign test & sign rank test are only applicable for…
Matched pairs
How does the sign rank test differ from sign test?
It accounts for magnitude of the difference as well as sign
Describe sign rank test
Rank absolute differences in ascending order of magnitude
If two values have the same magnitude, assign the average rank
Sum up R+ and R- separately
What is our test statistic for the sign rank test?
T = MIN {R+, R-}
Under H0 for the sign rank test, what is E(T) and V(T)
E(T) = n(n+1)/4 V(T) = n(n+1)(2n+1)/24
What n<25, how do we work out our CVs for the sign rank test?
Use the tables given in the formula sheet. Correct value of alpha dependent on 1/2/ sided test.
When do we reject H0 for the sign rank test? Why?
If our test statistic < CV
As we are minimising
If n>25, what do we do for the sign rank test?
Invoke CLT = approx normality
Our test statistic is given by [T - n(n+1)]/4 / sqrt [n(n+1)(2n+1)/24]
Limitation of sign rank test
Ignores spread of data - if highest absolute difference is 2, given rank n. If highest is 100, still given rank n. This may compress or stretch data. Less powerful than a parametric test, but more than just the sign test.
When n>25 for sign rank test, when do we reject?
Reject if p value is less than the significance level (same as usual hypothesis testing)
When is the Mann-Whitney test applicable?
We can use it even if we don’t have matched pairs. Use for independent random samples for difference in means.
How do we rank equal magnitudes in the sign rank test?
Average rank e.g. If two numbers are to be ranked 4&5, give the, both rank 4+5/2 = 4.5
Describe the Mann Whitney test
Rank all observations n1 + n2 but preserve the colour
Equal values given an average rank
Sum of R1
Work out U(see formula sheet), E(U) & V(U) and then test statistic
If n>25, approx normal.
When do we reject for Mann Whitney test?
If n>25, approx normal = Z test
Double p value
Reject if p value is less than the significant level
When do we use goodness of fit test?
Where we have discrete outcomes into k categories (can also use for continuous data but need to put into discrete categories first)
Describe goodness of fit test
Calculate Ei = npi for each category K
See formula sheet to calculate test statistic using simpler version
Follows a chi squared distribution.
Reject if test stat > CV given by chi squared
When do we reject H0 for goodness of fit test?
If the test statistic is greater than the CV given by chi squared distribution
What distribution do we get the CV from for a goodness of fit test?
Chi squared
DOF for CV for goodness of fit test =
DOF = K - 1 where K is the number of different categories
What is a condition for the goodness of fit test to be appropriate? How can we solve it?
Ei should not be <5 for any category; if it is aggregate two categories.
What is H0 usually for goodness of fit test ?
H0 = all outcomes equally likely
So Pi = 1/k
Ei = n/k for each category
What data are contingency tables used for?
Where we have a two way table with K categories in A and H in B, so we have KH cross classifications.
Why don’t we use hypothesis testing or ANOVA instead of contingency tables?
Hypothesis tests limited to two groups
ANOVA allows >2 groups but requires assumption of normality.
What are contingency tables another form of?
Goodness of fit test but with a two way table rather than one
What is H0 for contingency table?
H0 = variables are not related H1 = variables are related
How do we work out the expected values for contingency tables?
Eij = n pij
Since under H0 variables are independent, pij = p(i) x p(j)
What distribution do we get CVs from for contingency tables?
Chi squared
How do we work out degrees of freedom for contingency tables?
DOF = (r - 1)(c - 1)
r=Number of rows
c=number of columns
When do we reject H0 for contingency tables?
If test statistic is greater than CV given by chi squared distribution