Statistical tests Flashcards
K-S test or Kolmogorov-Smirnov test
Nonparametric test for the equality of continuous, one-dimensional probability distributions for one sample or two samples.
The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution (in the two-sample case) or that the sample is drawn from the reference distribution (in the one-sample case)
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic: see below. Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test.
Mann-Whitney U test
Also known as Mann–Whitney–Wilcoxon (MWW) , Wilcoxon rank-sum test or Wilcoxon-Mann-Whitney test
This is a non-parametric statistical hypothesis test for assessing whether one of two samples of independent observations tends to have larger values than the other. It is one of the most well-known non-parametric significance tests.
A very general formulation is to assume that:
(1) All the observations from both groups are independent of each other,
(2) The responses are ordinal (i.e. one can at least say, of any two observations, which is the greater),
(3) Under the null hypothesis the distributions of both groups are equal, so that the probability of an observation from one population (X) exceeding an observation from the second population (Y) equals the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry between populations with respect to probability of random drawing of a larger observation.
(4) Under the alternative hypothesis the probability of an observation from one population (X) exceeding an observation from the second population (Y) (after exclusion of ties) is not equal to 0.5. The alternative may also be stated in terms of a one-sided test, for example: P(X > Y) + 0.5 P(X = Y) > 0.5.
Additional facts:
-related to kendall’s tau: equivalent if one variable is binary.
Non-parametric tests
Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed.
Examples include:
Kolmogorov–Smirnov test
Mann–Whitney U or Wilcoxon rank sum test
Siegel–Tukey test
sign test
Wilcoxon signed-rank test
Anderson–Darling test
Kuiper’s test
Logrank Test
McNemar’s test
median test
Pitman’s permutation test
Wald–Wolfowitz runs test
Parametric tests
Parametric statistics is a branch of statistics that assumes that the data has come from a type of probability distribution and makes inferences about the parameters of the distribution.
Parametric methods make more assumptions than non-parametric methods. If those extra assumptions are correct, parametric methods can produce more accurate and precise estimates. They are said to have more statistical power.
Examples of parametric tests:
t-tests
What test(s) to use when you have two samples of data independently collected and you want to compare whether one has values greater than the other?
(1) Mann-Whitne U test (also called Mann–Whitney–Wilcoxon (MWW) , Wilcoxon rank-sum test or Wilcoxon-Mann-Whitney test)
- More robust to outliers
- Good if data is ordinal, but not interval scaled
- In case of normality, 95% efficient compared to t-test
- non parametric, usually a better choice, except perhaps in small sample sizes, power might be important
(2) independent samples Student’s t-test
- Assumes normality (parametric)
Wilcoxon signed-rank test
Non-parametric statistical test to compare two related samples, matched samples, or repeated measurements on a single sample to determine if their population mean ranks differ – a paired difference test.
Assumes:
(1) Data are paired and come from the same population.
(2) Each pair is chosen randomly and independent.
(3) The data are measured on an interval scale (ordinal is not sufficient because we take differences), but need not be normal.
Related methods: sign test, t-test (paired student’s t-test, t-test for matched pairs, t-test for dependent samples), paired z-test
Paired difference tests
(what it is, methods, and popular uses)
In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ. A paired difference test uses additional information about the sample that is not present in an ordinary unpaired testing situation, either to increase the statistical power, or to reduce the effects of confounders.
Methods include:
- t-test (when stdev. is not known)
- paired Z-test (when stdev. is known)
- Wilcoxon signed-rank test (non-normal distributions, assumes symmetric distribution?)
- sign test (non-parametric, does not assume symmetry?, less powerful)
Popular uses include:
- before and after a treatment – “repeated measures” tests (increases power)
- reduce confounding by introducing artificial pairs that match on some level…
Sign test
Non parametric test to test the hypothesis that there is “no difference in medians” between the continuous distributions of two random variables X and Y, in the situation when we can draw paired samples from X and Y.
Because it is non-parametric it has very general applicability but may lack the statistical power of other tests such as the paired-samples t-test or the Wilcoxon signed-rank test.
Method:
Let p = Pr(X > Y), and then test the null hypothesis H0: p = 0.50. In other words, the null hypothesis states that given a random pair of measurements (xi, yi), then xi and yi are equally likely to be larger than the other.
Then let W be the number of pairs for which yi − xi > 0. Assuming that H0 is true, then W follows a binomial distribution W ~ b(m, 0.5). The “W” is for Frank Wilcoxon who developed the test, then later, the more powerful Wilcoxon signed-rank test.
Median test
*Mostly thought to be obsolete due to low power
Instead use: Wilcoxon–Mann–Whitney U two-sample test
It is a nonparametric test that tests the null hypothesis that the medians of the populations from which two samples are drawn are identical.
Difference betwen this and Mann-Whitney U
The relevant difference between the two tests is that the median test only considers the position of each observation relative to the overall median, whereas the Wilcoxon–Mann–Whitney test takes the ranks of each observation into account. Thus the latter test is usually the more powerful of the two.
Methods to compare means
See http://en.wikipedia.org/wiki/Comparing_means
Kuiper’s test
Kuiper’s test is used in statistics to test that whether a given distribution, or family of distributions, is contradicted by evidence from a sample of data.
Properties: invariant to cyclic transformations, and as sensitive in tails as near median
Uses: cyclic variations by time of year or day of wee/time of day, in general any circular probability distributions
Related to: Kolmogorov–Smirnov test, Anderson–Darling test
More
Kuiper’s test[1] is closely related to the more well-known Kolmogorov–Smirnov test (or K-S test as it is often called). As with the K-S test, the discrepancy statistics D+ and D− represent the absolute sizes of the most positive and most negative differences between the two cumulative distribution functions that are being compared. The trick with Kuiper’s test is to use the quantity D+ + D− as the test statistic. This small change makes Kuiper’s test as sensitive in the tails as at the median and also makes it invariant under cyclic transformations of the independent variable. The Anderson–Darling test is another test that provides equal sensitivity at the tails as the median, but it does not provide the cyclic invariance.
This invariance under cyclic transformations makes Kuiper’s test invaluable when testing for cyclic variations by time of year or day of the week or time of day, and more generally for testing the fit of, and differences between, circular probability distributions.
Jarque–Bera test
The Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution
Cramér–von Mises criterion
Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function compared to a given empirical distribution function , or for comparing two empirical distributions. It is also used as a part of other algorithms, such as minimum distance estimation.
In one-sample applications is the theoretical distribution and is the empirically observed distribution. Alternatively the two distributions can both be empirically estimated ones; this is called the two-sample case.
Related alternative tests: Kolmogorov–Smirnov test, Watson test (almost same)
Siegel–Tukey test
The Siegel–Tukey test is a non-parametric test which may be applied to data measured at least on an ordinal scale. It tests for differences in scale between two groups.
The test is used to determine if one of two groups of data tends to have more widely dispersed values than the other. In other words, the test determines whether one of the two groups tends to move, sometimes to the right, sometimes to the left, but away from the center (of the ordinal scale).
1960
More:
The principle is based on the following idea:
Suppose there are two groups A and B with n observations for the first group and m observations for the second (so there are N = n + m total observations). If all N observations are arranged in ascending order, it can be expected that the values of the two groups will be mixed or sorted randomly, if there are no differences between the two groups (following the null hypothesis H0). This would mean that among the ranks of extreme (high and low) scores, there would be similar values from Group A and Group B.
If, say, Group A were more inclined to extreme values (the alternative hypothesis H1), then there will be a higher proportion of observations from group A with low or high values, and a reduced proportion of values at the center.
Hypothesis H0: σ2A = σ2B & MeA = MeB (where σ2 and Me are the variance and the median, respectively)
Hypothesis H1: σ2A > σ2B
Statistical hypothesis tests
Statistical hypothesis tests answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?.[2] That probability is known as the P-value.
Statistical hypothesis testing is a key technique of frequentist statistical inference. The Bayesian approach to hypothesis testing is to base decisions on the posterior probability.
Wald–Wolfowitz runs test
The runs test (also called Wald–Wolfowitz test) is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.
“run” of a sequence is a maximal non-empty segment of the sequence consisting of adjacent equal elements. For example, the sequence “++++−−−+++−−++++++−−−−” consists of six runs, three of which consist of +’s and the others of −’s. The run test is based on the null hypothesis that the two elements + and - are independently drawn from the same distribution.
Under the null hypothesis, the number of runs in a sequence of length N is a random variable whose conditional distribution given the observation of N+ positive values and N− negative values (N = N+ + N−) is approximately normal.
The mean and variance do not depend on the “fairness” of the process generating the elements of the sequence, that is, that +’s and −’s have equal probabilities, but only on the assumption that the elements are independent and identically distributed. If the number of runs is significantly higher or lower than expected, the hypothesis of statistical independence of the elements may be rejected.
Runs tests can be used to test:
the randomness of a distribution, by taking the data in the given order and marking with + the data greater than the median, and with – the data less than the median; (Numbers equalling the median are omitted.)
whether a function fits well to a data set, by marking the data exceeding the function value with + and the other data with −. For this use, the runs test, which takes into account the signs but not the distances, is complementary to the chi square test, which takes into account the distances but not the signs.
The Kolmogorov–Smirnov test is more powerful, if it can be applied.
Kendall’s W
Kendall’s W (Kendall’s coefficient of concordance) is a non-parametric statistic. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters. Kendall’s W ranges from 0 (no agreement) to 1 (complete agreement).
More:
uppose, for instance, that a number of people have been asked to rank a list of political concerns, from most important to least important. Kendall’s W can be calculated from these data. If the test statistic W is 1, then all the survey respondents have been unanimous, and each respondent has assigned the same order to the list of concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various responses.
While tests using the standard Pearson correlation coefficient assume normally distributed values and compare two sequences of outcomes at a time, Kendall’s W makes no assumptions regarding the nature of the probability distribution and can handle any number of distinct outcomes.
W is linearly related to the mean value of the Spearman’s rank correlation coefficients between all pairs of the rankings over which it is calculated.
Friedman test
The Friedman test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.
Classic examples of use are:
n wine judges each rate k different wines. Are any wines ranked consistently higher or lower than the others?
n wines are each rated by k different judges. Are the judges’ ratings consistent with each other?
n welders each use k welding torches, and the ensuing welds were rated on quality. Do any of the torches produce consistently better or worse welds?
The Friedman test is used for one-way repeated measures analysis of variance by ranks. In its use of ranks it is similar to the Kruskal-Wallis one-way analysis of variance by ranks.
When using this kind of design for a binary response, one instead uses the Cochran’s Q test.
Durbin test
In the analysis of designed experiments, the Friedman test is the most common non-parametric test for complete block designs. The Durbin test is a nonparametric test for balanced incomplete designs that reduces to the Friedman test in the case of a complete block design.
More:
In a randomized block design, k treatments are applied to b blocks.For some experiments, it may not be realistic to run all treatments in all blocks, so one may need to run an incomplete block design. In this case, it is strongly recommended to run a balanced incomplete design. A balanced incomplete block design has the following properties:
Every block contains k experimental units.
Every treatment appears in r blocks.
Every treatment appears with every other treatment an equal number of times.
The Durbin test is based on the following assumptions:
The b blocks are mutually independent. That means the results within one block do not affect the results within other blocks.
The data can be meaningfully ranked (i.e., the data have at least an ordinal scale).
Cochran’s Q test is applied for the special case of a binary response variable (i.e., one that can have only one of two possible outcomes)
Cochran’s Q test
In statistics, in the analysis of two-way randomized block designs where the response variable can take only two possible outcomes (coded as 0 and 1), Cochran’s Q test is a non-parametric statistical test to verify if k treatments have identical effects.[1][2] It is named for William Gemmell Cochran. Cochran’s Q test should not be confused with Cochran’s C test, which is a variance outlier test.
Cochran’s Q test assumes that there are k > 2 experimental treatments and that the observations are arranged in b blocks
Cochran’s Q test is
H0: The treatments are equally effective.
Ha: There is a difference in effectiveness among treatments.