MMB-STATS-Dictionary Flashcards
1.
Define Contrasts
- A contrast is a series of numbers which encodes a way of comparing group means in a more complex way than just pairwise comparisons.
- The numbers must all add to zero.
- The order of the numbers is the order of your coding of the groups. For example, the contrast (1, 1, 1, -3) compares groups 1, 2, and 3 collectively, with group 4. The contrast (-1, -1, 1, 1, 0) compares groups 1 and 2 together with groups 3 and 4 together, ignoring group 5.
- There must be as many numbers in the contrast as there are groups. If you want to ignore any groups, they must have a zero in the appropriate place.
- You can use contrasts to compare just two groups. For example, (1, 0, 0, -1) would compare groups 1 and 4.
2
When might contrasts be useful?
Contrasts are especially useful if you have a small number of a priori hypotheses that you want to test over a number of groups in a “one way” design, i.e. where there is just one IV with several levels (each level corresponds to one group). If you are testing more than one contrast, a Bonferroni correction (or other appropriate method of statistical adjustment for multiple comparisons) must be applied.
The statistical test of significance is a one sample t-test against zero. In effect, the contrast (-1, 2, -1) for example is testing whether Mean2 - (average of means 1 and 3) is significantly different from zero. Some statistical packages (such as SPSS) and textbooks report contrasts as F tests. In most cases, this is identical to the t-test (because t2[df] = F[1,df]). T-test statistics have the advantage that they can be used one-tailed, unlike an F statistic.
3.
Define Qualitative interaction
In a 2x2 ANOVA, an interaction where the slopes of the two lines representing the two levels of one IV are of opposite sign.
If the variables are relabelled so that the variable representing the two lines becomes the variable with two levels on the horizontal axis and vice versa, a qualitative interaction will become a cross-over interaction.
3
Omnibus F test
A significance test in analysis of variance which tests whether all the population means are equal to one another. A significant result enables us to deduce that this null hypothesis is violated, but in general, it does not state how it is violated, or in which direction.
5
Levene’s Test
A test for HOV (homogeneity of variance) in between groups factorial designs. The null hypothesis is that the within groups variance for all the cells is equal, which is one of the assumptions for the most basic form of the between groups t-test, and for ANOVA.
A significant result for Levene’s test implies that this assumption is violated. For the between groups t-test, there is a simple alternative test which is automatically run in SPSS. For ANOVA, there are post-hoc tests such as the Games-Howell test which do not assume HOV. If you are applying a priori tests such as contrasts, HOV is not an assumption so Levene’s test is not applicable.
6
Sphericity
This is the analogue in repeated measures designs of HOV for between groups designs (see Levene’s Test). A significant result in Mauchly’s test means that the result of any omnibus test in RM ANOVA must be interpreted with caution.
This test may be inappropriate if the RM variable is time, in which case trend analysis may be more suitable than omnibus testing.
For more on this test, see
https://statistics.laerd.com/statistical-guides/sphericity-statistical-guide.php
7
Bivariate correlation tests
There are two main tests for correlation between two continuous variables, ie where the variables are both either on interval or ratio scales (excluding nominal and ordinal variables). The first involves a statistic called Pearson’s product-moment coefficient, often written r, and the second uses a statistic named Spearman’s rank correlation coefficient, written rho (Greek letter r). Recall that ‘statistic’ refers to a number calculated from the dataset in a sample.
Both r and rho test the null hypothesis that there is no linear relationship between two variables, namely that in the population at large, the parameter (r or rho) is zero: recall that a parameter is the value of something for the whole population, compared with a statisticwhich refers to a particular sample. For this to be the case, r (or rho) in the actual sample needs to be far enough away from zero: how far is “far enough” depends on the size of the sample. For a sample of, say, 100, a Pearson’s correlation of 0.2 will be just significant. For a sample of 30, r will have to be greater than 0.37 (or less than -0.37) to be significant at the .05 level.
Pearson’s r involves a parametric test and certain assumptions are required for its validity (a linear relationship between the variables and something called multivariate normality); the Spearman test is its non-parametric equivalent and is calculated by transforming each of the two variables by rank-ordering the data, and applying Pearson’s test to the result.
Note. There is a possibility for confusion that arises from the fact that “rho” can have two different meanings in the context of correlation. Rho is sometimes used to refer to the value of Pearson’s r in the population, and r is used to refer to the Pearson statistic for a sample. This is the same convention that uses M to refer to a sample mean, and the Greek equivalent (mu) to refer to a population mean. Spearman’s rho, on the other hand, is a statistic calculated from the sample. In practice, it will always be obvious in any particular instance where rho is used, which meaning is intended.
8
Regression towards the mean
This is a statistical effect caused when a measure which is subject to random error is taken twice. Individuals who had an extreme score on the first occasion will tend to score closer to the mean on the second score, whence the name (regression means ‘stepping back’: in this case, stepping back towards the mean).
The effect is caused by the fact that individuals at an extreme end of a group, whether at the high or low end of the measure, will tend to be those for whom the measurement error reinforced the tendency towards an extreme score. The effect is most obvious if all in the group in fact have the same true score on some measure, in which case when the measurement is taken, the highest scorer will be the person for whom the measurement error was highest. On the next occasion, given that the mean for measurement error is generally zero, that person’s measured score will probably show a lower value.
Regression can be a threat to internal validity in a two group design. A typical example would be in the case of a drug trial, where the treatment group is composed of volunteers (perhaps for ethical reasons) and the control group is randomly selected. The treatment group may then tend to consist of people whose condition may have deteriorated in the short term due to random influences which may act in the other direction by the end of the experiment, giving rise to a false apparent treatment effect. Essentially, regression introduces a selection bias.
Another way in which regression to the mean can affect internal validity is if two non-equivalent groups are “matched” on some measurement in a quasi-experiment, in an attempt to compensate for inherent differences in the groups. If the groups differ in a variable, then matching on the basis of the variable is likely to contaminate the results, since the errors in mesurement for the two groups will tend to be in opposite directions, and so an inherent selection bias will result.
One often cited example of this is the Head Start program for disadvantaged children. A control group of typical children (no intervention) was matched with a disadvantaged group (received intervention) on a measure of academic achievement. The results were disappointing, but one explanation may have been that the disadvantaged sample were in the top range of their group, and their scores may have been inflated by a regression artifact.
For more on this effect, see page 120 at the following link:
http://www.psych.uncc.edu/pagoolka/Ch4QuasiExperimental.pdf
One way of avoiding this threat to internal validity in such situations as Head Start is to use the regression discontinuity design. For details see:
http://www.socialresearchmethods.net/kb/quasird.php
9
A priori tests of significance
- A priori means “from the earlier”, and it refers to a specific hypothesis which you make before you conduct your experiment (and the analysis of your data).
- If for example you start out with a hypothesis that all classical music has a generally calming effect on listeners whereas any heavy metal music generates tension, you might want to measure the effect of a selection of classical and metal extracts on the calmness/tension axis (perhaps by self-report, or by using a skin conductance meter).
- Your hypothesis is a priori, and if that is the only effect you are interested in, you could go straight to testing a contrast, given by grouping all the classical responses in one set and the heavy metal responses in another set and comparing the two. You would not be obliged to compare responses to every pair of means in all cells, but you might want to check whether there were any significant differences in responses within the categories of music.
- A priori hypotheses can often be tested using some form of contrast. If you are testing more than one a priori hypothesis, you will however need to consider one of the options for safeguarding against inflation of type-I error rate, such as using a Bonferroni correction.
- The “opposite” of an a priori test of significance is an a posteriori or post hoc test, which is when you have no particular expectations or tests that you want to do from the outset. See separate entry.
10
Post hoc tests of significance
- Post hoc means “after this”, and it refers to the fact that such tests do not rely on any assumptions made in advance of data gathering. This is in contrast to a priori tests, which have to be specified in advance of getting the data. Post hoc tests often include, implicitly or explicitly, a large number of separate tests of significance, whereas typically, a priori tests will be limited to a small number of tests, which refer to hypotheses well defined, and clearly justified, in advance.
- With post hoc testing, there is a danger that researchers will engage in “data dredging”, or searching for significant patterns in the data with the likelihood that such patterns will often appear by chance and be spurious. To prevent this, post hoc tests should always be carried out according to some standard system which includes protection against inflation of type-I error rates.
- Some systems attain this objective with a high level of efficiency, and are to be preferred. For example, it is possible to do post hoc tests comparing all the pairs of group means in one-way ANOVA by doing a series of pairwise t-tests, and then using a Bonferroni correction for type-I error correction, but it is generally more powerful to use one of the tests specifically designed for the purpose of universal pairwise comparisons, such as Tukey’s “honestly significant difference”, a masterly piece of public relations labelling but which does not belie its name, and is very useful in practice.
11.
Confidence intervals
- As an alternative, or a supplement, to the standard null hypothesis significance tests, some people advocate providing (usually) 95% confidence intervals for the effect being estimated.
- Effect sizes are estimates of how much influence an IV or a group of IVs has on a DV.
- Most NHSTs test for whether an effect size is zero. A significant result enables us to claim that the effect size is non-zero, and usually the result also gives a direction for the effect. But other than that, a significant result really tells you very little, only that (for example) it is positive.
- The idea of CIs is that they give you more information about the likely size of the effect. If you do an independent groups t-test, SPSS will give you a 95% CI for the size of the difference between the means of the two groups. This enables you to say with a degree of confidence that there is not only an effect present, but roughly how big it is, and in most practical applications this is almost as important, ultimately, as the conclusion that the effect exists.
- The logic of how 95% CIs are derived is quite complicated. I will try to find a good diagram explaining it. Watch this space.
12
t-tests
- This is a class of null hypothesis significance tests where the variance in the underlying population or populations is unknown, and where it has to be estimated from the sample itself.
- The tests all involve testing at most two separate groups; if more than two groups are involved, or more than two independent variables, a variant of ANOVA is probably going to be necessary.
- t-tests comprise the independent groups t-test, the paired samples t-test, and the one sample t-test.
- Of these the first compares the means of two separate groups on some continuous DV, and tests the null hypothesis that they are sampled from populations in which the mean of the DV is identical. The test in SPSS includes a supplementary test of the assumption that the variances in the two populations are equal (the Levene test). If the test is non-significant, the p-value for the first row in the output table is read, and if significant, the p-value in the corresponding second row must be accepted. The test can also produce confidence intervals for the size of the difference between the means.
- The paired samples t-test is the repeated measures or within subjects version of the independent groups t-test. It can be used when one group of individuals is measured twice on a DV under different conditions, and tests whether there is an effect on the DV of the change in conditions. It can also be used when individuals from two groups are pairwise matched on some control variable.
- The one-sample t-test is used when it we need to test whether the average value of a DV in a population differs from some specified value. The default setting for that value in SPSS is zero, but any other value can be substituted, so for example we can test whether a particular subgroup of individuals in the population differs significantly from the norm for the whole population on some psychometric variable. In this case it is appropriate to use a t-test and not a z-test, because we cannot assume that the variance in the subgroup of interest is the same as the variance in the complete population.
13
Tukey’s HSD test
- Tukey’s “Honestly Significant Difference” post hoc test can be used to test pairwise comparisons of all the means in a one-way ANOVA simultaneously, while safeguarding against inflation of type-I error rates.
- It is often better to use this test than the obvious alternative of carrying out a set of individual pairwise comparisons using independent groups t-tests (or better, the equivalent contrasts), because the multiple comparisons requires the use of a Bonferroni correction, which usually reduces the power of that alternative to below that of Tukey’s test.
- The test does make the assumption of homogeneity of variance, and if this is violated (shown by a significant result for Levene’s test) an alternative post hoc test such as the Games-Howell test is recommended.
14
Kolmogorov-Smirnov test of normality
- This is a significance test, which examines the null hypothesis that a given set of data relating to some continuous variable could have been drawn from a population where that variable is normally distributed. A significant result means that the null hypothesis can be rejected, and hence that the distribution cannot be assumed to be normal.
- It is often used to test the assumption, made when using ANOVA (and some other parametric tests), that the distribution of the error terms is normal. In that case, it needs to applied to the distribution of the DV within each group separately (or, if there are two or more categorical IVs, it needs to be applied to the contents of the separate cells). Like most tests of assumptions, you are usually hoping for a non-significant result, meaning that ANOVA (for example) can be validly applied. The K-S test should be non-significant in all the groups (or cells) for the main parametric test to proceed.
- If the K-S test is significant, you will need to examine the possibility of transforming the DV into a form in which it becomes normal, removing outliers, or alternatively, using a non-parametric test of significance.
15.
Kruskal-Wallis test
- This is the non-parametric equivalent of one-way ANOVA. It is used for between-group designs, and can be used to test the null hypothesis that a set of three or more group means are drawn from the same population. A significant result will provide evidence for rejecting the null, but like the omnibus test in one-way ANOVA, it will not directly show which group mean(s) differ from which others.
- The test does not assume normality of the distribution of the within-groups error term, so that it can be used when the normality assumption for one-way ANOVA is violated.
- The Kruskal-Wallis test works by transforming the values of the DV across groups by rank ordering them for the whole dataset, ignoring group membership. The test is then performed by calculating a statistic, K, which is based on calculating how far the average ranks for the different groups differ from the expected value given the null hypothesis. If K exceeds a certain value, depending on alpha and the number of groups, the test is declared to be significant.
- The Kruskal-Wallis test is the equivalent for multiple groups of the Mann-Whitney test. The Mann-Whitney is used when comparing two groups, and where the independent groups t-test is inappropriate because distributions within groups are not normal.