lecture 7 - wilcoxon matched pairs test Flashcards

1
Q

when do we use a wilcoxon matched-pairs test?

A

Within- subjects test of differences for two condition experiments with ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Behaviours - differences vs relationships - differences?

A
  • Are clinically depressed patients less depressed after taking a new drug than a placebo? (IV: discrete, DV: continuous)
  • Are people more likely to order a salad a restaurant with a green colour scheme than a blue one? (IV: discrete, DV: discrete)
  • Do energy drinks result in better memory performance than coffee? (IV: discrete, DV: ? Probably continuous)
    Does giving a significant other roses make them more likely to be happy than giving them tulips?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

relationships?

A
  • Is the therapeutic effectiveness (continuous variable) of an antidepressant drug related to the dosage (continuous variable)?
  • Does the brightness of the lights in a restaurant (continuous) relate to how fast people eat (continuous)?
  • Does the amount of caffeine in a person consumes in an energy drink (continuous) relate to memory performance (continuous)?
    Does the number of roses given to a significant other relate to their happiness?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

differences vs relations

A

not all that different and quite closely related

The key point here is that relationships between things, e.g. height and foot size, also imply some kind of differences, e.g. height is different depending on foot size. And differences imply also imply some kind of relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

tests of differences between two conditions

A

within-subjects tests - also called matched pairs test
Ordinal data: Wilcoxon matched-pairs test
Interval data: within-subjects t-test
Between-subjects tests
Ordinal data: Wilcoxon rank-sum test
Interval data: between-subjects t-test
Important note. “within-subjects” tests have data split into related “pairs” whereas “between subject tests don’t.
within-subjects” tests AKA matched-pairs tests can be applied to when a give participant does both conditions, but they can also be applied to situations were the data are paired in other ways, e.g., getting pairs of judgments from both partners who are a couple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

two condition experiment example

A

you forgot your significant other’s birthday. which will likely make them less angry with you (DV), tulips or roses (IV)?

Which test should you use?
What did you measure?
How much useful information do your measurements contain?
Who did you measure it on?
Were the data points in the two conditions paired?
Yes = within-subjects design No = between-subjects, independent samples design
We’ll start by assuming within-subjects designs
nominal (or maybe ordinal) DV (within subjects): sign test
measurement information: categoriesL I LOVE YOU versus I DO NOT LOVE YOU (umm I HATE you)
[I don’t know whether it’s more than anything, e.g. a trip to the dentist, or how much more…. sorry…]
Ordinal DV (within subjects): “nonparametric” Wilcoxon matched pairs test
Measurement information: I love you MORE than….
[e.g. than a trip to the dentist but I don’t know how much more…. sorry ….]
Interval/ratio DV: t-test
Measurement information: I love you THIS MUCH more….
[e.g. I love you three times as much as a trip to the dentist….
And zero really does mean the total absence of my love….. sorry…..]
Note. A sign test can also be applied to ordinal data but it basically discards information about rank ordering of pairs so all pairs are reduced to either + or – and it discards 0 pairs.

could use a sign test as its a within subjects design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sign test 2-tailed

A

M = no of +ve or -ve differences whichever is smaller
eg no +ve = 5
no -ve = 1
m= 1

call N the total no of non-zero differences
eg N= 5+1 = 6

find critical value from table
eg M must be 0 or less

find SPSS output in notes
- first number is N
- second number down is M
- bottom number is the P value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sign test assumptions

A

Pairs of related data points are randomly and independently sampled from their populations
randomly sampling an individual data point….

independently sampling data points….

It’s the pairs of data points that need to be sampled independently of each other not the two components of the pairs

In practice, most psychology research is based on data that aren’t truly randomly sampled from a population and nor are the data completely independent of each other. But we still do research and assess that research based on these sampling assumptions on the grounds that we haven’t violated these assumptions to badly…..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sign test conceptualisation -

A

repeat the experiment many times assuming the null hypothesis is true
* Assuming +’s and –’s are equally likely (50:50),
* Randomly generate a sample of n = 6 (e.g. by flipping a coin).
* Count the number of +’s.
* Repeat experiment many times.
* Generate the histogram of the number of +’s.
* Find the number of +’s corresponding to <= 2.5% of the samples in each tail: 0 +’s and 6 +’s.
The more repeats, the closer the estimates are to the true values (the black dots).

This process ends up doing the same thing as picking the smaller of the number of +’s and –’s and checking it against the left tail of the distribution as on the previous slide.
Also note. These numbers are not just magic: The probability of getting zero heads if you flip a fair coin six times is just 0.50.50.50.50.5*0.5 = 0.0156, the value of the first black dot above the 0 bar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why not quite with the sign test?/ just do them everywhere

A

because a nonsignificant result does not support the null hypothesis. you just fail to reject it.

possible reasons for the nonsignficant result -
there is no difference in reality
or
there is a difference in reality
but your experiment didn’t detect it eg because it has low “power” or you were unlucky
the problem is that having done this analysis, you/we don’t know which of these is true. all we know is that the result isn’t significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

possible solutions to clarify a nonsignificant Sign test

A
  1. Run a better experiment with more sensitive measures
    e.g. replace hate you versus love you response options
    with a rating:
    how much do you love me on a scale
    from to
    0 = I don’t love you at all . . . to . . . 50 = I really, really love you.
  2. Choose a better/more powerful
    statistic
    e.g. a Wilcoxon matched-pairs test or a within-subjects t test
  3. Formally evaluate “statistical power”(?)
    e.g. if power is actually high and the result is not significant that may provide some support for
    the null hypothesis.
  4. Do Bayesian statistics which can (sometimes) support the null
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

practical advice

A
  • Don’t do stats until ‘know’ what they will show!
  • Formal stats are hard
  • Appropriate research conclusions are harder
  • Intuitively understand data by making pictures of it
    Then confirm that understanding with formal stats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sign test to wilcoxon matched pairs

A

look at notes for more detail

using tulips and roses example -
These difference results relate to the sign test’s +’s and –’s. The sign test was using less information: instead of +7 it just had +, instead of +3, it just had +. So the sign test is treating +7 and +3 as the same thing.
Similarly the difference can be converted to ordinal data by first ignoring the sign of the difference, putting the differences in order from smallest to biggest and then replace the differences with the respective ranks. So putting the difference in order from smallest to biggest ignoring the signs gives
1, 3, 4, 5, 7, 12. So replacing this by their ranks in the order give 1, 2, 3, 4, 5, 6.

Remember that the sign test discarded information about everything except whether the differences were positive or negative. So its discarding information about the magnitudes of the differences (treating both +12 and +7 as both just +). The Wilcoxon matched pairs test discards somewhat less information in the it does treat +12 as ordinally more than +7, but it discards information about how much more. Similarly it treats +7 as just more than +5 but doesn’t know how much more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

wilcoxon matched-pairs test (AKA signed-ranks test)

A

work out the sum of the positive ranks
work out the sum of the negative ranks
T is the smaller of the total positive or total negative ranks so in this exam T = 1

look at critical value table - look at row of correct number of pairs of scores ignoring any tied pairs and see if the T value is in the required range if its not the result is not significant at p> 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

wilcoxon matched pairs -m reporting the result

A

eg T(6) = 1, P> 0.05

T = type of statistic
6 = number of nonzero pairs
1 = value of statistic
0.05 = significance level

SPSS output in notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

warning on computers

A

Computers make calculating stats easy but that doesn’t mean they’re right or appropriate for supporting your research conclusions
SPSS doesn’t know where your data came from or what you’re trying to do with it, so it mostly won’t stop you from doing inappropriate things.
Statistical tests have assumptions that need to be met to support valid conclusions.
Certain tests support some research conclusions but not others.
Analogy: A robot surgeon may soon do an excellent job of taking out your gallbladder but that’s not very helpful if it was your appendix that needed removing…
SPSS isn’t smart enough to keep you from doing dumb things….
For example, your research question might be whether tulips and roses produce different ratings of happiness, but SPSS won’t stop you from assessing the degree of relationship between those variables with a correlation coefficient.

17
Q

how do you know data is definitely just ordinal?

A

Most measurements in psychology fall in a grey realm between being
Definitely just ordinal
and
Definitely at least interval
So what if this data really is interval?

18
Q

non parametric tests

A

use if you have a small sample and can’t rely on the central limit theorem to get you out of trouble.
our of the most common non-parametric procedures: the Mann–Whitney test, the Wilcoxon signed-rank test, Friedman’s test and the Kruskal–Wallis test. All four tests overcome distributional problems by ranking the data: that is, finding the lowest score and giving it a rank of 1, then finding the next highest score and giving it a rank of 2, and so on. This process results in high scores being represented by large ranks, and low scores being represented by small ranks. The model is then fitted to the ranks and not the raw scores. By using ranks we eliminate the effect of outliers.
Ranking the data reduces the impact of outliers and weird distributions, but the price you pay is to lose information about the magnitude of differences between scores. Consequently, non-parametric tests can be less powerful than their parametric counterparts.

19
Q

to define power of a test

A

The problem is that to define the power of a test we need to be sure that it controls the Type I error rate (the number of times a test will find a significant effect when there is no effect to find) this error rate is normally set at 5%. When the sampling distribution is normally distributed then the Type I error rate of tests based on this distribution is indeed 5%, and so we can work out the power. However, when the sampling distribution is not normal the Type I error rate of tests based on this distribution won’t be 5% (in fact we don’t know what it is because it will depend on the shape of the distribution) and so we have no way of calculating power (because it is linked to the Type I error rate 0. So, if someone tells you that non-parametric tests have less power than parametric tests, tell them that this is true only if the sampling distribution is normally distribute

20
Q

comparing two independent conditions

A

There are two choices to compare the distributions in two conditions containing scores from different entities: the Mann–Whitney test (Mann & Whitney, 1947) and the Wilcoxon rank-sum test (Wilcoxon, 1945). Both tests are equivalent, and to add to the confusion there’s a second Wilcoxon test that does something different.

when the groups have unequal numbers of participants in them, the test statistic (Ws) for the Wilcoxon rank-sum test is simply the sum of ranks in the group that contains the fewer people; when the group sizes are equal it’s the value of the smaller summed rank

21
Q

how to do a wilcoxon rank-sum

A

we arrange the scores in ascending order and attach a label to remind us from which group each score came. Starting at the lowest score, we assign potential ranks starting with 1 and going up to the number of scores we have.

I’ve called these ‘potential ranks’ because sometimes the same score occurs more than once in a data set (e.g., in these data a score of 6 occurs twice, and a score of 35 occurs three times). These are called tied ranks, and we rank them with the value of the average potential rank for those scores. For example, our two scores of 6 would’ve been ranked as 3 and 4, so we assign a rank of 3.5, the average of these values

Once we’ve ranked the data, we add the ranks for the two groups. First we add the ranks of the scores from the alcohol group (you should find the sum is 59) and then add the ranks of the scores from the ecstasy group (this value is 151). Our test statistic is the lower of these sums, which for these data is the sum for the Wednesday data, Ws = 59

22
Q

how do we determine if a wilcoxon rank sum is significant?

A

the mean and standard error of this test statistic can be calculated from the sample sizes of each group (n1 is the sample size of group 1 and n2 is the sample size of group 2)

Ws =n1 (n,1+ n2 +n3)/ 2 = mean

SEWs = √n1n2(n1+n2+10)/12 = standard error

use these to calculate a z score

23
Q

calculating an effect size

A

r = z/√N

z is the z-score and N is the size of the study

24
Q

comparing two related conditions - the wilcoxon signed-rank test

A

used in situations where you want to compare two sets of scores that are related in some way

based on ranking the differences between scores in the two conditions you’re comparing. Once these differences have been ranked (just like in Section 7.4.1), the sign of the difference (positive or negative) is assigned to the rank

find the significance by hand by looking at the mean and standard error

T = n(n+1)/ 4 = mean

SET= √n(n+1)(2n+1)/ 24

then use these to find the z score

25
Q

calculating an effect size for the wilcoxon signed-rank test

A

The effect size can be calculated in the same way as for the Mann–Whitney test