lecture 7 - wilcoxon matched pairs test Flashcards
when do we use a wilcoxon matched-pairs test?
Within- subjects test of differences for two condition experiments with ordinal data
Behaviours - differences vs relationships - differences?
- Are clinically depressed patients less depressed after taking a new drug than a placebo? (IV: discrete, DV: continuous)
- Are people more likely to order a salad a restaurant with a green colour scheme than a blue one? (IV: discrete, DV: discrete)
- Do energy drinks result in better memory performance than coffee? (IV: discrete, DV: ? Probably continuous)
Does giving a significant other roses make them more likely to be happy than giving them tulips?
relationships?
- Is the therapeutic effectiveness (continuous variable) of an antidepressant drug related to the dosage (continuous variable)?
- Does the brightness of the lights in a restaurant (continuous) relate to how fast people eat (continuous)?
- Does the amount of caffeine in a person consumes in an energy drink (continuous) relate to memory performance (continuous)?
Does the number of roses given to a significant other relate to their happiness?
differences vs relations
not all that different and quite closely related
The key point here is that relationships between things, e.g. height and foot size, also imply some kind of differences, e.g. height is different depending on foot size. And differences imply also imply some kind of relationships.
tests of differences between two conditions
within-subjects tests - also called matched pairs test
Ordinal data: Wilcoxon matched-pairs test
Interval data: within-subjects t-test
Between-subjects tests
Ordinal data: Wilcoxon rank-sum test
Interval data: between-subjects t-test
Important note. “within-subjects” tests have data split into related “pairs” whereas “between subject tests don’t.
within-subjects” tests AKA matched-pairs tests can be applied to when a give participant does both conditions, but they can also be applied to situations were the data are paired in other ways, e.g., getting pairs of judgments from both partners who are a couple.
two condition experiment example
you forgot your significant other’s birthday. which will likely make them less angry with you (DV), tulips or roses (IV)?
Which test should you use?
What did you measure?
How much useful information do your measurements contain?
Who did you measure it on?
Were the data points in the two conditions paired?
Yes = within-subjects design No = between-subjects, independent samples design
We’ll start by assuming within-subjects designs
nominal (or maybe ordinal) DV (within subjects): sign test
measurement information: categoriesL I LOVE YOU versus I DO NOT LOVE YOU (umm I HATE you)
[I don’t know whether it’s more than anything, e.g. a trip to the dentist, or how much more…. sorry…]
Ordinal DV (within subjects): “nonparametric” Wilcoxon matched pairs test
Measurement information: I love you MORE than….
[e.g. than a trip to the dentist but I don’t know how much more…. sorry ….]
Interval/ratio DV: t-test
Measurement information: I love you THIS MUCH more….
[e.g. I love you three times as much as a trip to the dentist….
And zero really does mean the total absence of my love….. sorry…..]
Note. A sign test can also be applied to ordinal data but it basically discards information about rank ordering of pairs so all pairs are reduced to either + or – and it discards 0 pairs.
could use a sign test as its a within subjects design
sign test 2-tailed
M = no of +ve or -ve differences whichever is smaller
eg no +ve = 5
no -ve = 1
m= 1
call N the total no of non-zero differences
eg N= 5+1 = 6
find critical value from table
eg M must be 0 or less
find SPSS output in notes
- first number is N
- second number down is M
- bottom number is the P value
sign test assumptions
Pairs of related data points are randomly and independently sampled from their populations
randomly sampling an individual data point….
independently sampling data points….
It’s the pairs of data points that need to be sampled independently of each other not the two components of the pairs
In practice, most psychology research is based on data that aren’t truly randomly sampled from a population and nor are the data completely independent of each other. But we still do research and assess that research based on these sampling assumptions on the grounds that we haven’t violated these assumptions to badly…..
Sign test conceptualisation -
repeat the experiment many times assuming the null hypothesis is true
* Assuming +’s and –’s are equally likely (50:50),
* Randomly generate a sample of n = 6 (e.g. by flipping a coin).
* Count the number of +’s.
* Repeat experiment many times.
* Generate the histogram of the number of +’s.
* Find the number of +’s corresponding to <= 2.5% of the samples in each tail: 0 +’s and 6 +’s.
The more repeats, the closer the estimates are to the true values (the black dots).
This process ends up doing the same thing as picking the smaller of the number of +’s and –’s and checking it against the left tail of the distribution as on the previous slide.
Also note. These numbers are not just magic: The probability of getting zero heads if you flip a fair coin six times is just 0.50.50.50.50.5*0.5 = 0.0156, the value of the first black dot above the 0 bar.
why not quite with the sign test?/ just do them everywhere
because a nonsignificant result does not support the null hypothesis. you just fail to reject it.
possible reasons for the nonsignficant result -
there is no difference in reality
or
there is a difference in reality
but your experiment didn’t detect it eg because it has low “power” or you were unlucky
the problem is that having done this analysis, you/we don’t know which of these is true. all we know is that the result isn’t significant.
possible solutions to clarify a nonsignificant Sign test
- Run a better experiment with more sensitive measures
e.g. replace hate you versus love you response options
with a rating:
how much do you love me on a scale
from to
0 = I don’t love you at all . . . to . . . 50 = I really, really love you. - Choose a better/more powerful
statistic
e.g. a Wilcoxon matched-pairs test or a within-subjects t test - Formally evaluate “statistical power”(?)
e.g. if power is actually high and the result is not significant that may provide some support for
the null hypothesis. - Do Bayesian statistics which can (sometimes) support the null
practical advice
- Don’t do stats until ‘know’ what they will show!
- Formal stats are hard
- Appropriate research conclusions are harder
- Intuitively understand data by making pictures of it
Then confirm that understanding with formal stats
sign test to wilcoxon matched pairs
look at notes for more detail
using tulips and roses example -
These difference results relate to the sign test’s +’s and –’s. The sign test was using less information: instead of +7 it just had +, instead of +3, it just had +. So the sign test is treating +7 and +3 as the same thing.
Similarly the difference can be converted to ordinal data by first ignoring the sign of the difference, putting the differences in order from smallest to biggest and then replace the differences with the respective ranks. So putting the difference in order from smallest to biggest ignoring the signs gives
1, 3, 4, 5, 7, 12. So replacing this by their ranks in the order give 1, 2, 3, 4, 5, 6.
Remember that the sign test discarded information about everything except whether the differences were positive or negative. So its discarding information about the magnitudes of the differences (treating both +12 and +7 as both just +). The Wilcoxon matched pairs test discards somewhat less information in the it does treat +12 as ordinally more than +7, but it discards information about how much more. Similarly it treats +7 as just more than +5 but doesn’t know how much more.
wilcoxon matched-pairs test (AKA signed-ranks test)
work out the sum of the positive ranks
work out the sum of the negative ranks
T is the smaller of the total positive or total negative ranks so in this exam T = 1
look at critical value table - look at row of correct number of pairs of scores ignoring any tied pairs and see if the T value is in the required range if its not the result is not significant at p> 0.05
wilcoxon matched pairs -m reporting the result
eg T(6) = 1, P> 0.05
T = type of statistic
6 = number of nonzero pairs
1 = value of statistic
0.05 = significance level
SPSS output in notes