Testing ANOVA assumptions Flashcards

1
Q

A Type I error has been defined as the

A

probability of rejecting the null hypothesis when in fact the null hypothesis is true
•This applies to every statistical test that we perform on a set of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If we perform several statistical tests on a set of data we can effectively

A

increase the chance of making a Type I error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we perform two statistical tests on the same set of data then we have a…

A

range of opportunities of making a Type I error
•Type I error on the first test only
•Type I error on the second test only
•Type I error on both the first and the second test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are per comparison errors?

A

Type I errors involving single tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are familywise errors?

A

A whole set of type 1 errors

E.g. •Type I error on the first test only
•Type I error on the second test only
•Type I error on both the first and the second test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The relationship between pre comparison and familywise error rates is very simple:

A

Afw = c ( Apc)

Where c is the number of comparisons

  • So if we have made three comparisons, we can expect 3(0.05) = 0.15 errors. If we make twenty comparisons, we will on average make one error [200.05=1.0].
  • Of course, if we make twenty comparisons, it is possible that we may be making 0, 1, 2 or in rare cases even more errors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

With planned comparisons :

A

Ignore the theoretical increase in familywise type I error rates and reject the null hypothesis at the usual per comparison level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

With post hoc or unplanned comparisons between the means:

A

we cannot afford to ignore the increase in familywise error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A variety of different post hoc tests are commonly used - for example:

A
  • Scheffé
  • Tukey HSD
  • t-tests
  • These tests vary in their ability to protect against Type I errors.
  • Increasing Type I protection reduces Type II protection.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Scheffe Test

A
  • The Scheffé is calculated in exactly the same way as a planned comparison
  • Scheffé differs in terms of the FCritical that is adopted.
  • For the one-way between groups analysis of variance the critical F associated with an FScheffé is given by:
  • where a is the number of treatment levels and F(dfA, dfS/A) is the critical value of F for the overall, omnibus analysis of variance.
  • For our example
  • Omnibus ANOVA critical value F(2,12)= 3.885. There were three treatment levels so (3-1)*3.885= 7.77.
  • Fobserved = 14.29
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tukey HSD

A
  • The Tukey (Honestly Significant Difference) test establishes a value for the smallest possible significant difference between two means.
  • Any mean difference greater than the critical difference is significant
  • The critical difference is given by:
  • where q(a,df,a) is found in tables of the studentized range.
  • This particular formula only works for between groups analysis of variance with equal cell sizes
  • A variety of different formulae are used for different designs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Bonferroni correction?

A

When comparing two means, a modified form of the t-test is available.

  • For multiple comparisons the critical value of t is found using
  • p=0.05/c
  • where c is the number of comparisons.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When comparing two means, a:

A

modified form of the t-test is available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For multiple comparisons the critical value of t is found using:

A
  • p=0.05/c

* where c is the number of comparisons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Post Hoc Tests

A
  • Post-hoc tests are conservative – they reduce the chance of type I errors by greatly increasing type II errors.
  • Only very robust effects will be significant.
  • Null results using these tests are not easy to interpret.
  • Many different post hoc tests exists and have different merits and problems
  • Many post hoc tests are available on computer based statistical packages (e.g. SPSS or Experstat)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Post Hoc tests are conservative, this means…

A
  • they reduce the chance of type I errors by greatly increasing type II errors.
  • Only very robust effects will be significant.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Null results using post Hoc tests are not…

A

Easy to interpret
•Many different post hoc tests exists and have different merits and problems
•Many post hoc tests are available on computer based statistical packages (e.g. SPSS or Experstat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Many post hoc tests are available on

A

computer based statistical packages (e.g. SPSS or Experstat)

19
Q

The assumptions of the F ratio

A
  • Independence
  • The numerator and denominator of the F-ratio are independent
  • Random Sampling
  • Observations are random samples from the populations
  • Homogeneity of Variance
  • The different treatment populations have the same variance.
  • Normality
  • Observations are drawn from normally distributed populations
20
Q

The assumptions of the F ratio:

A

Independence
Random sampling
Homogeneity of variance
Normality

21
Q

The assumptions of the F ratio

Independence

A

The numerator and denominator of the F-ratio are independent

22
Q

The assumptions of the F ratio

Random Sampling

A

Observations are random samples from the populations

23
Q

The assumptions of the F ratio

Homogeneity of Variance

A

The different treatment populations have the same variance

24
Q

The assumptions of the F ratio

Normality

A

•Observations are drawn from normally distributed populations

25
Q

Testing assumptions of ANOVA

Each of these assumption should be met before…

A

progressing onto the analysis.

26
Q

There are two assumptions of ANOVA that we have to assume have been met by the experimenter…

A
  • Independence and Random Sampling

* If an experiment has been designed appropriately both of these assumptions will be true.

27
Q

Both the homogeneity of variance and the normality assumptions of ANOVA…

A

need not necessarily be true

28
Q

Testing Homogeneity of variance

When looking a between groups designs use

A
  • Hartley’s F-max
  • Bartlett
  • Cochran’s C
  • All these tests are sensitive to departures from normality
  • All of these tests are available in SPSS (as are a number of other tests)
29
Q

Testing homogeneity of variance

When looking at within or mixed designs use

A

Box’s M

30
Q

For hand calculations, there is a quick and dirty measure of homogeneity of variance:

A

Largest variance / smallest variance

Smaller than 4?
this is a heuristic. When you have the option, use one of the specific tests (e.g. Bartlett).

31
Q

•The three most commonly used tests for normality are:

A
  • Skew
  • Lilliefors
  • Shapiro-Wilks
  • These tests compare the distribution of the data to a theoretically derived normal distribution.
  • All these tests are very sensitive to departures from normality when there are large samples.
32
Q

The Lilliefors and Shapiro-Wilks are difficult to

A

calculate by hand, but both are available on SPSS.

33
Q

Testing normality by examining skew

A
  • Since we assume
  • that the distributions of the population from which the samples are taken are normal
  • and the skew of a normal distribution is equal to zero
  • Then
  • One test of normality is to see if the skew is significantly different to zero
  • In other words, test the value of skew to see if it deviates significantly from a normal distribution.
34
Q

The simplest test we can use is ____ to test for skew

A

Z-score

35
Q

Transforming data reduces the probability of making a

A

type II error

36
Q

type II error occurs when we fail to

A

reject the null hypothesis when it is false

37
Q

If an assumption is broken, ANOVA fails gracefully:

A

we will miss real effects (type II) but we will not increase our rate of making claiming effects that do not exist (type I)

38
Q

Data should be transformed when either:

A

the data is not homogenous or not normal

39
Q

Solving the homogeneity problem often solves the:

A

normality problem and vice versa

40
Q

What happens when transforming the data is impossible?

A

•In general we proceed with the analysis but advise caution to the reader when reporting the results
•This is particularly important if the observed F value has an associated probability, p, such that
0.1>p>0.01
•In these circumstances it is difficult to know whether a type I error or a type II error is being made or if no error is being made at all.

41
Q

What can we do in order to meet the assumption of the analysis of variance?

A
  • In order to return our data to normality and establish homogeneity of variance we can use transformations.
  • These are simply mathematical operations that are applied to the data before we conduct an analysis of variance.
  • However, there are three circumstances where no transformation to the data will work:
  • Variances are heterogenous
  • Distributions are heterogenous
  • Variances are heterogeneous and distributions are heterogeneous
42
Q

In the case of skew the z-score is given by:

A

Z = skew-0 / SE skew

43
Q

The standard error of skew is given by:

A

SEskew = square root of 6/N

  • where N is the number of cases in the sample.
  • If a z score associated with the skew is greater than |±1.96| then the sample is significantly different from normal.
  • In other words, a value of skew which is significantly different from zero, would mean that we do not have normally distributed data