[L9] Analysis of Variance & [L10] The Kruskal-Wallis Test Flashcards

1
Q
  • Very flexible and general technique, and the principles
    can be applied to a wide range of statistical tests.
A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ANOVA is a ___ test

A

parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Has a wide range of applications.

A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Many of applications make some tricky assumptions
about the data.

A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In ANOVA we measure an ___ variable (also called
a ___ variable).
* This outcome must be measured on a ___ scale.

A

outcome; dependent - continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

It is called dependent because it depends on one or more
__ variables

A

predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

__
_ variables can be Manipulated (Treatment) or
variables we simply measure (Sex).

A

Predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In ANOVA, predictor variables are mostly ___,
although continuous variables can also be used in the
same framework

A

categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When predictor variables are categorical, they are also
called “__“_

A

FACTORS or INDEPENDENT VARIABLES.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

___ – measurement of differences

A

ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Differences happen for two reasons: ___

A

(a) because of the
effect of predictor variables (b) because of other reasons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In ANOVA, we want to know two things:
___

A
    1. How much of the variance (difference) between the
      two groups is due to the predictor variable
    1. Whether this proportion of variance is statistically
      significant, that is, it is larger than we would expect by
      chance if the null hypothesis were true?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

We can divide (statisticians sometimes say partition)
variance into three different types:
___

A
  1. The Total Variance
  2. Variance due to treatment, (Differences between Group)
  3. Variance due to Error (Differences within Group)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In ANOVA, the variance is conceptualized as sums of
_
__

A

squared deviations from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In ANOVA, the variance is conceptualized as sums of
squared deviations from the mean.
* It is usually shortened to___ and denoted by
__

A

sum of squares; SS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The 3 Sum of Squares

A
  1. Total Sum of Squares, SS total
  2. Between-groups Sum of Squares, SS between
  3. Error Sum of Squares, SS within
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

___– this is the variance
that represents the difference between the groups, and this
is called _
_

. Sometimes it refers to the betweengroups
sum of squares for one predictor, in which case it
is called SS predictor. Sometimes it is called___.

A

Between-groups Sum of Squares; SSbetween; SStreatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The ___-groups variance is the variance that we are
actually interested in.

A

between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

We are asking whether the difference between the groups
(or the effect of the predictor) is big enough that we could
say it is ___

A

not due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

_
_
_– also called within-groups sum
of squares.

A

Error Sum of Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

It’s within the groups, because different people, who
have had the same treatment, have different scores.

A

Error Sum of Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

They have different scores because of error. So this is
called either ___

A

SSwithin, or SSerror.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

We need to calculate the three kinds of Sum of Squares,
___

A

TOTAL, WITHIN GROUPS, and BETWEEN
GROUPS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

_
_
_sum of squared differences between the mean
and each score.

A

SStotal –

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
___ * To know how large the effect of the treatment has been
Calculating the Effect Size:
26
The same as asking what ___ the treatment effect has been responsible for.
proportion of the Total Variance (or Total Sum of Squares)
27
Effect Size goes under two different names: these are ___.
RSquared or eta-Squared
28
Mean Squares. Often written as MS. * These are ___
MSbetween, MSwithin, MStotal
29
Three sets of degrees of freedom
* df total, df between, and df within
30
Finally we calculate the relative size of the two values, by dividing MS between by MS within. * This gives us the statistic for ANOVA, which is called F, or sometimes the __ __
F-ratio.
31
To find the probability value associated with F we need to have two sets of degrees of freedom, the ___
between and within.
32
__ _are exactly the same test. * It is just a different way of thinking about the result (when we have two groups).
ANOVA and t-test
33
In fact, if we take the ___and square it. We get the value of F.
value of t
34
This is a general rule when there are 2 groups: ___
F = t-squared.
35
Question: If we covered t-tests, why are we doing it again? _
* t-test – restricted to comparing 2 groups. * ANOVA extends in a number of useful directions.
36
___ extends in a number of useful directions. * Can be used to compare 3 groups or more, to calculate the p-value associated with the Regression Line, and ina wide range of ___ situations
ANOVA, other
37
When there are 2 groups, ANOVA is equivalent to a ___ and it therefore makes the same assumptions as the t-test.
t-test,
38
same assumptions as the t-test. and it makes these assumptions regardless of the number of _ __ that are being compared
groups
39
Assumptions in ANOVA
1. Normal distribution within each group 2. Homogeneity of Variance
40
We do not assume that the outcome variable is normally distributed. * What we do assume is that __ each group are normally distributed.
data within; Normal distribution within each group
41
_ * As with the t-test, we assume that the standard deviation within each group is approximately equal.
Homogeneity of Variance
42
the variance being the square of the .
SD
43
* However, as with the t-test, we don’t need to worry about this assumption, if we have approximately equal __ _ in each group.
numbers of people
44
ANOVA comparing Three Groups
* Formulae are all the same.
45
__most elementary analysis of variance
One way ANOVA –
46
One way ANOVA – Also called as __
simple-randomized groups design, independent groups design, or the single factor experiment, independent groups design.
47
_ __that is being investigated, there are two or more levels or conditions of the IV, and subjects are randomly assigned to each condition.
Only one IV (one factor); one-way anova
48
ANOVA – not limited to ___experiments.
single factor
49
The effect of many different __ may be investigated at the same time in one experiment
anova; factors
50
_ – one in which the effects of two or more factors or IVs are assessed in one experiment
Factorial experiment
51
Conditions or treatments used are combinations of the __
levels of factors.
52
_more complicated, However, we get a lot more information
Two way ANOVA –
53
Two way ANOVA –It allows in one experiment to evaluate the __ of two IVs and the __ between them.
effect; interaction
54
__– the levels of each factor were systematically chosen by the experimenter rather than being randomly chosen
Fixed effects design
55
We want to determine whether factor A has a significant effect, disregarding the effect of factor B. This is called the __
main effect of factor A.
56
We want to determine whether factor B has a significant effect, without considering the effect of factor A. This is called the ___
main effect of factor B.
57
finally, we want to determine whether there is an interaction between factors A and B. This is called the __
interaction effect of factors A and B.
58
Three analyses in fixed effects design:
1. main effect of factor A. 2. main effect of factor B. 3.interaction effect of factors A and B.
59
In analyzing data from a two-way ANOVA, we determine four variance estimates:
1. MS within cells 2. MS rows 3. MS columns 4. MS interaction
60
The estimate ___ is the within cells variance estimate and corresponds to the within groups variance estimate used in the one-way ANOVA
MS within cells
61
It becomes the __ against which the other estimates, MS rows, MS columns, and MS interactions, are compared.
standard
62
The other estimates are sensitive to the__
effects of the IVs.
63
The estimate MS rows is called the_
row variance estimate.
64
row variance estimate is based on the variability of the row means and, hence, is sensitive to the_
effects of variable A.
65
The estimate MS columns is called the _
column variance estimate.
66
column variance estimate is based on the variability of the column means and, hence, is sensitive to the_
effects of variable B.
67
The estimate MS interaction is the __
interaction variance estimate
68
interaction variance estimate is based on the variability of the cell means and, hence, is sensitive to the _
interaction effects of variables A and B.
69
If variable A has no effect, MS rows is an __of the __
independent estimate; σ-squared.
70
Finally, if there is no interaction between variables A and B, MS interaction is also an _
independent estimate of σ – squared.
71
Thus, the estimates MS rows, MS columns, and MS interaction are analogous to the ___ _ of the one-way ANOVA design
between-groups variance estimate
71
Each F (or F obtained) value is evaluated against __ (critical value) as in the one way analysis
F crit
72
In a two-way ANOVA, we can essentially two one-way experiments, plus we are able to evaluate the interaction between the __
two independent variables.
73
In a 2-way ANOVA, we partition the total sum of squares (SS total), into four components:
1. the withincells sum of squares, 2. the row sum of squares, 3. the column sum of squares, 4. and the interaction sum of squares.
74
When these Sum of Squares (SS) are divided by the appropriate degrees of freedom, they form four variance estimates..
(MS within-cells, MS rows, MS columns and MS interaction
75
Only difference is that with the row sum of squares we use the __ means, whereas the between-groups sum of squares used the __ means.
row; group
76
In ANOVA we aim to find out if there are differences between the groups, but not _ _
what those differences are.
77
Usually, we test the hypothesis that: μ1 = μ2 = μ3 * In the case of two groups, this is not a problem, because if the mean of group 1 is different from the mean of group 2, that can only happen in __
one way.
78
However, when we reject a null hypothesis when there are three or more groups, we aren’t really saying enough. * We are just saying that group 1, group 2, and group 3 (and so on, up to group k) are ___
not the same.
79
Unlike the two-group solution, this can happen in __
lots of ways.
80
_ to answer the question of where the differences come from.
Post Hoc tests –
81
Post hoc” is Latin, and means “
after this”.
82
Post hoc tests are tests done after __. * They are based on --
ANOVA; t-tests.
83
It is possible to just do t-tests to compare groups, but this would cause a problem called -
alpha inflation
84
Alpha is the Type__error rate.
I
85
A _ _is where we reject a null hypothesis that is true.
Type I error
86
The probability value that we choose as a cut-off (usually 0.05) is the __.
Type I error rate
87
That is, it is the probability of getting a __ result in our test, if the population effect is actually zero.
significant
88
When 3 tests are done, a cut-off of 0.05 is used, and most think that the probability of a Type I error is __.
still 0.05
89
We call 0.05 our _ _error rate, because that is the Type I error rate we have named.
nominal Type I
90
The problem is that the Type I error rate has __, and it is no longer our true type I error rate.
risen above 0.05
91
When we do multiple t-tests, following an ANOVA, we are at risk of __.
capitalizing on chance
92
The probability that one of those tests will be statistically significant is not 0.05, but is actually closer to __
three times 0.05 or 0.15, about 1 in 7.
93
So our actual type I error rate is much -- than our nominal rate.
higher
94
we need to perform some sort of __ and we can’t use our plain ordinary t-test.
modified test
95
- * Assumption of homogeneity of variance
Bonferroni Correction
96
Bonferroni Correction What is done here is to calculate the pooled standard error, and then calculate three t-tests using this __
pooled standard error.
97
However, there are 2 reasons, why we are not going to do this. (Bonferroni Correction)
1st: it is tricky 2nd: It is so unintuitive.
98
__ * When there are two groups, we calculate the standard error, and then calculate the confidence interval, based on multiplying the SE by the critical value for t. at 0.05 level.
Bonferroni Corrected Confidence Intervals
99
Bonferroni Corrected Confidence Intervals: We carry out the same procedure, except we are no longer using the __
95% level.
100
We have to adjust alpha by dividing by __, to give 0.0166.
3
101
We then calculate the critical value for _ _, using the new value for alpha.
t
102
To calculate the confidence intervals, we need to know the _
critical value of t.
103
Since we are now using the value of alpha corrected for the number of tests (say 3), we are now going to be doing __, so we need to use 0.05/3 = 0.0166.
three tests
104
Before we can determine the critical value, we need to know the __ * The df are calculated in the same way as the t-test. That is, df = N-2, where N is the total sample size for the two groups we are comparing.
degrees of freedom (df).
105
* Calculation of statistical significance is also straightforward once we have the standard errors of the differences-
Bonferroni Corrected Statistical Significance
106
The value for t is equal to the __
difference divided by the standard error of difference.
107
Bonferroni Correction __ – to find probability value.
Computer
108
Bonferroni Correction Two advantages:
1. it controls our type I error rates, which means that we don’t go making spurious statements. 2. it is easy to understand.
109
Whenever we do ___, we can Bonferroni correct by multiplying the probability value by the number of tests, and treating this as the probability value.
multiple tests
110
Or equivalently, dividing our cut-off by the ___, and rejecting only null hypotheses that have significance values lower than that cut-off.
number of tests
111
Problem: Bonferroni Correction
it is a very unwieldy and very blunt tool. * Not that precise. * The p-values required for statistical significance rapidly become very small.
112
* Non-parametric test used with independent groups design.
The Kruskal-Wallis Test
113
Substitute for one-Way ANOVA if assumptions are violated.
The Kruskal-Wallis Test
114
The Kruskal-Wallis Test Does not assume population __
normality or homogeneity of variance.
115
The Kruskal-Wallis Test: Requires only __ scaling of __ variable
ordinal; dependent
116
Kruskal-Wallis Test: The statistic we compute is .
H
117
Kruskal-Wallis Test Step 1:
All of the scores are grouped together and rank-ordered, assigning the rank
118
Kruskal-Wallis Test Step 2:
When this is done, the ranks for each condition or sample are summed- evaluate stats
119
To use the Kruskal-Wallis test, the data must be of at least __ scaling.
ordinal
120
there must be at least __ scores in each sample to use the probabilities given in the table for Chi-square.
five