A priori and Post-hoc comparisons Flashcards
Error rates:
There are two different ways of specifying error rates (the probability of making a Type I error):
- Error rate per comparison (PC): α = α’:
The probability of making a Type I error on any given comparison - Familywise error rate (FW):
The probability of at least one Type I error occurring in any given experiment FW α = 1 – (1 – α’)c where α’ = per comparison error rate and c = number of comparisons
To derive the formula for familywise error rates, we start from knowledge of the probability of a Type I error for a single comparison (α). From this we can work out the probability of not getting a Type I error on a single comparison, which is 1 – α.
For example, if the probability of an error is .05, the probability of not getting an error is .95. Therefore, the probability of not getting at least one Type I error on a number (c) of comparisons is (1 – α’)c.
For example, if there are 4 comparisons for which the PC error rate is .05, then:
(1 – α’)c = (.95)4 = .95 x .95 x .95 x .95 = .815
From this we can calculate the probability of getting at least one Type I error, which is 1 – (1 – α’)c. For the above example the FW equals 1 – .815 = .185. Note this formula is only true if the comparisons are independent of each other.
From this formula it is easy to calculate FW from PC, but usually we want to work out PC from FW, and that is hard to do. Therefore we use an approximation of this formula which is:
FW = c(α)
From this it is easy to see that:
α = FWc{“version”:”1.1”,”math”:”FWc”}
This approximation always overestimates the FW error rate, but for small numbers of comparisons, and small values of the per comparison error rate, the PC will be a reasonable approximation of FW (and is easier to calculate). For example, if 4 comparisons are made with a PC error rate of .05:
the approximate FW will be 4 x .05 = .20,
whereas the actual FW will be 1 – (1–.05)4 = 1 – .815 = .185
The whole point of distinguishing between different error rates is not some perverse form of mathematical masochism. If you make multiple comparisons you need to be aware that you are increasing the likelihood of a Type I error. When you evaluate the significance of a particular comparison, the alpha level you use corresponds to the familywise error rate, not the alpha level you set originally. The issue is how to combat this error-rate problem. One approach is to use ‘a priori’ or planned comparisons, with a correction to the alpha level that takes into account the number of comparisons planned. This approach will be considered in the next section. A second approach uses ‘post hoc’ tests, which test all pair-wise comparisons of means, and holds your familywise error rate to a nominated alpha level.
A Priori versus Post Hoc comparisons:
A priori comparisons are chosen before the data are collected and only a few comparisons are made. Post hoc comparisons are planned after the experimenter has examined the data, and often comparisons among all possible pairs of means are conducted. If comparisons are planned in advance and only a subset of comparisons are conducted, then this is a way of reducing the probability of a Type I error.
Some researchers consider the controversy of whether one should have an overall significant F before conducting multiple comparisons. Judd and McClelland (1995) strongly make the case that researchers should carefully consider their research questions and tailor analyses to address these, ignoring additional analyses (even if they traditionally complement the needed analysis) that are unrelated to your research questions. On that basis, they would generally advocate diving straight into planned comparisons rather than starting with the overall F test.
Overall, it is accepted that planned comparisons can be done irrespective of whether the overall F-value is significant or not.
‘A Priori’ comparisons or Planned Contrasts: Linear contrasts
Although Jamovi can help with the significance test for group differences in a priori and post hoc comparisons, it is really important to understand how we set up contrasts (because you need to feed this into Jamovi anyway), and it is also useful to understand what is done by Jamovi after receiving our numbers!
A post hoc approach is fine if all you want to do is compare pairs of treatment conditions. For example, you carry out an experiment to evaluate the effectiveness of three methods of treating depression: (a) psychoanalytic therapy, (b) behaviour modification therapy, and (c) drug therapy. You could do three post hoc tests ((a) v. (b), (a) v. (c), and (b) v. (c)). However, many research questions can be answered more directly (with fewer comparisons) by comparing one condition, or set of conditions, with another set of conditions. Given the error-rate problems which result from multiple comparisons, this more direct statistical route is to be preferred. In addition, with this approach the connection between the hypotheses under test and the statistics carried out is made explicit: Only those comparisons necessary to evaluate the research questions are performed.
In order to compare one group or set of groups with another group or set of groups we need to use linear combinations (we will spend time teaching the following in your seminars this week):
‘A Priori’ comparisons or Planned Contrasts: Sum of squares for contrasts:
The linear contrasts are then converted to sums of squares using this formula:
‘A Priori’ comparisons or Planned Contrasts: The choice of coefficients:
You can also set up coefficients for comparison groups if you have more than three groups. You need to form two sets of coefficients which when separately summed, add up to 1 (or -1 as the case may be). For instance, if we want to compare groups 1 and 2 against groups 3 and 4, the sum of coefficients for groups 1 and 2 would equal 1, and the sum of coefficients for groups 3 and 4 would equal -1 (or vice versa). This then means that the two summed coefficients (-1 and +1) would add up to zero – our magical number from before for linear contrasts.
To get a sum of 1 for the groups that you are combining, assign to each set the reciprocal (1/number of groups) of the number of groups, and one of the sets is given a minus sign. For instance, consider an example where you have five groups and want to compare the first three groups with the last two groups. The first set contains three groups so the reciprocal is 1/3. The other set contains two groups so the reciprocal is 1/2 and these are given a minus sign:
Means: X̄1 X̄2 X̄3 X̄4X̄5
aj 1/3 1/3 1/3 -1/2 -1/2
The test of significance:
Once you have calculated the sums of squares for your linear contrasts then this can be used to compute an F. The mean squares of your linear contrast will always be equal to the sums of squares because they have 1 degree of freedom:
MScontrast = SScontrast
The degree of freedom for the sums of squares is 1 as two sets of means are being examined, as in a t-test.
The mean squares is then divided by the MSresidual to obtain an F:
Orthogonal contrasts:
The two comparisons in the above worked example are independent of one another and are called “orthogonal contrasts”. This means that they contain no overlapping information. For example, knowing that Group 2 is greater than Group 1 and Group 3 tells us nothing about whether Group 1 is different from Group 3. If a set of contrasts are orthogonal, then the sums of squares of the linear comparisons add up to SStreat.
In order for contrasts to be orthogonal, the coefficients need to meet the following three conditions:
- Σ aj = 0
- Σ aj bj = 0
aj and bj are the set of coefficients for different contrasts. - Number of comparisons = number of df for treatments
The examples in the reading above use orthogonal contrasts, and Field (2018) specifies that when you multiply the weights for a particular group, these products should add to zero to ensure orthogonal contrasts.
Say if we had five treatment groups and we were firstly interested in comparing the combination of treatments 1 and 2 with the combination of treatments 3, 4 and 5. This can be seen as two branches. We then compare the treatments on each branch (1 with 2, and 3 with 4 and 5) and so on. In order to keep the contrasts orthogonal we never compare treatments on one branch with treatments on the other branch. This method is illustrated below:
(1, 2, 3, 4, 5)↓ ↓(1, 2)vs(3, 4, 5)↓ ↓ ↓ ↓(1) vs (2) (3) vs (4, 5) ↓ (4 vs 5)
The coefficients that would correspond to the above comparisons are:
1/21/2-1/3-1/3-1/31-1000001-1/2-1/20001-1
Note that the negative signs can be assigned to either side of the comparison. Also if we had begun with a different set of contrasts to begin with (e.g., 1 vs 2, 3, 4, and 5) then we would have a different set of contrasts.
In order to show that any set of coefficients are orthogonal, we need to show that all pairwise products of the coefficients sum to zero. The following sum of pairwise products illustrate that the set of coefficients produced above are orthogonal:
(1) (1/2)(1) + (1/2)(-1) + (-1/3)(0) + (-1/3)(0) + (-1/3)(0) = 0
(2) (1/2)(0) + (1/2)(0) + (-1/3)(1) + (-1/3)(-1/2) + (-1/3)(-1/2) = 0
(3) (1/2)(0) + (1/2)(0) + (-1/3)(0) + (-1/3)(1) + (-1/3)(-1) = 0
(4) (1)(0) + (-1)(0) + (0)(1) + (0)(-1/2) + (0)(-1/2) = 0
(5) (1)(0) + (-1)(0) + (0)(0) + (0)(1) + (0)(-1) = 0
(6) (0)(0) + (0)(0) + (1)(0) + (-1/2)(1) + (-1/2)(-1) = 0
The property of orthogonal contrasts - that they give non-redundant information which neatly adds up to the total sum of squares - makes them very appealing, especially to statisticians who like that sort of thing. Some statisticians go so far as to suggest that contrasts that don’t have these ‘nice’ properties should never be made. However others are more sensible, and remember that the point of carrying out contrasts is to find out what the results of an experiment mean, and so recommend doing whatever contrasts make sense, whether or not they are orthogonal.
Control of Familywise error rate:
The fact that multiple comparisons are made means that we still have the problem of inflated Type I error rates. There are different views about how to control for the familywise error rate with a priori comparisons. Some experimenters would not make any adjustment as these comparisons are planned or they would try to run as few comparisons as possible.
If we want to keep the familywise error rate strictly to .05 then we need to have a more conservative alpha level for each individual contrast. In fact, as we saw in section 6.2, the error rate per contrast should be PC = FW/C where C is the number of comparisons. This quite stringent adjustment to the alpha rate is termed the Bonferroni t or Dunn test.
Some statisticians (e.g. Keppel) regard Bonferroni t as too strict for a priori comparisons. For instance, if we wish to make 10 comparisons, our adjusted alpha level (using the Bonferroni correction) is .005, making it quite difficult to obtain a significant result! Accordingly, Keppel and others recommend a slightly less conservative procedure than this (note that a conservative test is one which is so strict that it is harder to find a significant result):
- if the number of a priori comparisons is no more than the df, do not bother adjusting the per comparison error rate at all.
- if the number of comparisons planned is greater than the df, use the slightly larger FW error rate obtained by multiplying the normal FW error rate by the df. This is called the modified Bonferroni procedure.