1. Total Sum of Squares, SS total 2. Between-groups Sum of Squares, SS between 3. Error Sum of Squares, SS within

[L9] Analysis of Variance & [L10] The Kruskal-Wallis Test Flashcards by Arellano, Miella Janica

Very flexible and general technique, and the principles
can be applied to a wide range of statistical tests.

ANOVA

How well did you know this?

Not at all

Perfectly

ANOVA is a ___ test

parametric

How well did you know this?

Not at all

Perfectly

Has a wide range of applications.

ANOVA

How well did you know this?

Not at all

Perfectly

Many of applications make some tricky assumptions
about the data.

ANOVA

How well did you know this?

Not at all

Perfectly

In ANOVA we measure an ___ variable (also called
a ___ variable).
* This outcome must be measured on a ___ scale.

outcome; dependent - continuous

How well did you know this?

Not at all

Perfectly

It is called dependent because it depends on one or more
__ variables

predictor

How well did you know this?

Not at all

Perfectly

__
_ variables can be Manipulated (Treatment) or
variables we simply measure (Sex).

Predictor

How well did you know this?

Not at all

Perfectly

In ANOVA, predictor variables are mostly ___,
although continuous variables can also be used in the
same framework

categorical

How well did you know this?

Not at all

Perfectly

When predictor variables are categorical, they are also
called “__“_

FACTORS or INDEPENDENT VARIABLES.

How well did you know this?

Not at all

Perfectly

___ – measurement of differences

ANOVA

How well did you know this?

Not at all

Perfectly

Differences happen for two reasons: ___

(a) because of the
effect of predictor variables (b) because of other reasons

How well did you know this?

Not at all

Perfectly

In ANOVA, we want to know two things:
___

1. How much of the variance (difference) between the
  two groups is due to the predictor variable
1. Whether this proportion of variance is statistically
  significant, that is, it is larger than we would expect by
  chance if the null hypothesis were true?

How well did you know this?

Not at all

Perfectly

We can divide (statisticians sometimes say partition)
variance into three different types:
___

The Total Variance
Variance due to treatment, (Differences between Group)
Variance due to Error (Differences within Group)

How well did you know this?

Not at all

Perfectly

In ANOVA, the variance is conceptualized as sums of
_
__

squared deviations from the mean

How well did you know this?

Not at all

Perfectly

In ANOVA, the variance is conceptualized as sums of
squared deviations from the mean.
* It is usually shortened to___ and denoted by
__

sum of squares; SS.

How well did you know this?

Not at all

Perfectly

The 3 Sum of Squares

Total Sum of Squares, SS total
Between-groups Sum of Squares, SS between
Error Sum of Squares, SS within

How well did you know this?

Not at all

Perfectly

___– this is the variance
that represents the difference between the groups, and this
is called _
_

. Sometimes it refers to the betweengroups
sum of squares for one predictor, in which case it
is called SS predictor. Sometimes it is called___.

Between-groups Sum of Squares; SSbetween; SStreatment

How well did you know this?

Not at all

Perfectly

The ___-groups variance is the variance that we are
actually interested in.

between

How well did you know this?

Not at all

Perfectly

We are asking whether the difference between the groups
(or the effect of the predictor) is big enough that we could
say it is ___

not due to chance

How well did you know this?

Not at all

Perfectly

_
_
_– also called within-groups sum
of squares.

Error Sum of Squares

How well did you know this?

Not at all

Perfectly

It’s within the groups, because different people, who
have had the same treatment, have different scores.

Error Sum of Squares

How well did you know this?

Not at all

Perfectly

They have different scores because of error. So this is
called either ___

SSwithin, or SSerror.

How well did you know this?

Not at all

Perfectly

We need to calculate the three kinds of Sum of Squares,
___

TOTAL, WITHIN GROUPS, and BETWEEN
GROUPS.

How well did you know this?

Not at all

Perfectly

_
_
_sum of squared differences between the mean
and each score.

SStotal –

How well did you know this?

Not at all

Perfectly

___ * To know how large the effect of the treatment has been

Calculating the Effect Size:

The same as asking what ___ the treatment effect has been responsible for.

proportion of the Total Variance (or Total Sum of Squares)

Effect Size goes under two different names: these are ___.

RSquared or eta-Squared

Mean Squares. Often written as MS. * These are ___

MSbetween, MSwithin, MStotal

Three sets of degrees of freedom

* df total, df between, and df within

Finally we calculate the relative size of the two values, by dividing MS between by MS within. * This gives us the statistic for ANOVA, which is called F, or sometimes the __ __

F-ratio.

To find the probability value associated with F we need to have two sets of degrees of freedom, the ___

between and within.

__ _are exactly the same test. * It is just a different way of thinking about the result (when we have two groups).

ANOVA and t-test

In fact, if we take the ___and square it. We get the value of F.

value of t

This is a general rule when there are 2 groups: ___

F = t-squared.

Question: If we covered t-tests, why are we doing it again? _

* t-test – restricted to comparing 2 groups. * ANOVA extends in a number of useful directions.

___ extends in a number of useful directions. * Can be used to compare 3 groups or more, to calculate the p-value associated with the Regression Line, and ina wide range of ___ situations

ANOVA, other

When there are 2 groups, ANOVA is equivalent to a ___ and it therefore makes the same assumptions as the t-test.

t-test,

same assumptions as the t-test. and it makes these assumptions regardless of the number of _ __ that are being compared

groups

Assumptions in ANOVA

1. Normal distribution within each group 2. Homogeneity of Variance

We do not assume that the outcome variable is normally distributed. * What we do assume is that __ each group are normally distributed.

data within; Normal distribution within each group

_ * As with the t-test, we assume that the standard deviation within each group is approximately equal.

Homogeneity of Variance

the variance being the square of the .

* However, as with the t-test, we don’t need to worry about this assumption, if we have approximately equal __ _ in each group.

numbers of people

ANOVA comparing Three Groups

* Formulae are all the same.

__most elementary analysis of variance

One way ANOVA –

One way ANOVA – Also called as __

simple-randomized groups design, independent groups design, or the single factor experiment, independent groups design.

_ __that is being investigated, there are two or more levels or conditions of the IV, and subjects are randomly assigned to each condition.

Only one IV (one factor); one-way anova

ANOVA – not limited to ___experiments.

single factor

The effect of many different __ may be investigated at the same time in one experiment

anova; factors

_ – one in which the effects of two or more factors or IVs are assessed in one experiment

Factorial experiment

Conditions or treatments used are combinations of the __

levels of factors.

_more complicated, However, we get a lot more information

Two way ANOVA –

Two way ANOVA –It allows in one experiment to evaluate the __ of two IVs and the __ between them.

effect; interaction

__– the levels of each factor were systematically chosen by the experimenter rather than being randomly chosen

Fixed effects design

We want to determine whether factor A has a significant effect, disregarding the effect of factor B. This is called the __

main effect of factor A.

We want to determine whether factor B has a significant effect, without considering the effect of factor A. This is called the ___

main effect of factor B.

finally, we want to determine whether there is an interaction between factors A and B. This is called the __

interaction effect of factors A and B.

Three analyses in fixed effects design:

1. main effect of factor A. 2. main effect of factor B. 3.interaction effect of factors A and B.

In analyzing data from a two-way ANOVA, we determine four variance estimates:

1. MS within cells 2. MS rows 3. MS columns 4. MS interaction

The estimate ___ is the within cells variance estimate and corresponds to the within groups variance estimate used in the one-way ANOVA

MS within cells

It becomes the __ against which the other estimates, MS rows, MS columns, and MS interactions, are compared.

standard

The other estimates are sensitive to the__

effects of the IVs.

The estimate MS rows is called the_

row variance estimate.

row variance estimate is based on the variability of the row means and, hence, is sensitive to the_

effects of variable A.

The estimate MS columns is called the _

column variance estimate.

column variance estimate is based on the variability of the column means and, hence, is sensitive to the_

effects of variable B.

The estimate MS interaction is the __

interaction variance estimate

interaction variance estimate is based on the variability of the cell means and, hence, is sensitive to the _

interaction effects of variables A and B.

If variable A has no effect, MS rows is an __of the __

independent estimate; σ-squared.

Finally, if there is no interaction between variables A and B, MS interaction is also an _

independent estimate of σ – squared.

Thus, the estimates MS rows, MS columns, and MS interaction are analogous to the ___ _ of the one-way ANOVA design

between-groups variance estimate

Each F (or F obtained) value is evaluated against __ (critical value) as in the one way analysis

F crit

In a two-way ANOVA, we can essentially two one-way experiments, plus we are able to evaluate the interaction between the __

two independent variables.

In a 2-way ANOVA, we partition the total sum of squares (SS total), into four components:

1. the withincells sum of squares, 2. the row sum of squares, 3. the column sum of squares, 4. and the interaction sum of squares.

When these Sum of Squares (SS) are divided by the appropriate degrees of freedom, they form four variance estimates..

(MS within-cells, MS rows, MS columns and MS interaction

Only difference is that with the row sum of squares we use the __ means, whereas the between-groups sum of squares used the __ means.

row; group

In ANOVA we aim to find out if there are differences between the groups, but not _ _

what those differences are.

Usually, we test the hypothesis that: μ1 = μ2 = μ3 * In the case of two groups, this is not a problem, because if the mean of group 1 is different from the mean of group 2, that can only happen in __

one way.

However, when we reject a null hypothesis when there are three or more groups, we aren’t really saying enough. * We are just saying that group 1, group 2, and group 3 (and so on, up to group k) are ___

not the same.

Unlike the two-group solution, this can happen in __

lots of ways.

_ to answer the question of where the differences come from.

Post Hoc tests –

Post hoc” is Latin, and means “

after this”.

Post hoc tests are tests done after __. * They are based on --

ANOVA; t-tests.

It is possible to just do t-tests to compare groups, but this would cause a problem called -

alpha inflation

Alpha is the Type__error rate.

A _ _is where we reject a null hypothesis that is true.

Type I error

The probability value that we choose as a cut-off (usually 0.05) is the __.

Type I error rate

That is, it is the probability of getting a __ result in our test, if the population effect is actually zero.

significant

When 3 tests are done, a cut-off of 0.05 is used, and most think that the probability of a Type I error is __.

still 0.05

We call 0.05 our _ _error rate, because that is the Type I error rate we have named.

nominal Type I

The problem is that the Type I error rate has __, and it is no longer our true type I error rate.

risen above 0.05

When we do multiple t-tests, following an ANOVA, we are at risk of __.

capitalizing on chance

The probability that one of those tests will be statistically significant is not 0.05, but is actually closer to __

three times 0.05 or 0.15, about 1 in 7.

So our actual type I error rate is much -- than our nominal rate.

higher

we need to perform some sort of __ and we can’t use our plain ordinary t-test.

modified test

- * Assumption of homogeneity of variance

Bonferroni Correction

Bonferroni Correction What is done here is to calculate the pooled standard error, and then calculate three t-tests using this __

pooled standard error.

However, there are 2 reasons, why we are not going to do this. (Bonferroni Correction)

1st: it is tricky 2nd: It is so unintuitive.

__ * When there are two groups, we calculate the standard error, and then calculate the confidence interval, based on multiplying the SE by the critical value for t. at 0.05 level.

Bonferroni Corrected Confidence Intervals

Bonferroni Corrected Confidence Intervals: We carry out the same procedure, except we are no longer using the __

95% level.

We have to adjust alpha by dividing by __, to give 0.0166.

We then calculate the critical value for _ _, using the new value for alpha.

To calculate the confidence intervals, we need to know the _

critical value of t.

Since we are now using the value of alpha corrected for the number of tests (say 3), we are now going to be doing __, so we need to use 0.05/3 = 0.0166.

three tests

Before we can determine the critical value, we need to know the __ * The df are calculated in the same way as the t-test. That is, df = N-2, where N is the total sample size for the two groups we are comparing.

degrees of freedom (df).

* Calculation of statistical significance is also straightforward once we have the standard errors of the differences-

Bonferroni Corrected Statistical Significance

The value for t is equal to the __

difference divided by the standard error of difference.

Bonferroni Correction __ – to find probability value.

Computer

Bonferroni Correction Two advantages:

1. it controls our type I error rates, which means that we don’t go making spurious statements. 2. it is easy to understand.

Whenever we do ___, we can Bonferroni correct by multiplying the probability value by the number of tests, and treating this as the probability value.

multiple tests

Or equivalently, dividing our cut-off by the ___, and rejecting only null hypotheses that have significance values lower than that cut-off.

number of tests

Problem: Bonferroni Correction

it is a very unwieldy and very blunt tool. * Not that precise. * The p-values required for statistical significance rapidly become very small.

* Non-parametric test used with independent groups design.

The Kruskal-Wallis Test

Substitute for one-Way ANOVA if assumptions are violated.

The Kruskal-Wallis Test

The Kruskal-Wallis Test Does not assume population __

normality or homogeneity of variance.

The Kruskal-Wallis Test: Requires only __ scaling of __ variable

ordinal; dependent

Kruskal-Wallis Test: The statistic we compute is .

Kruskal-Wallis Test Step 1:

All of the scores are grouped together and rank-ordered, assigning the rank

Kruskal-Wallis Test Step 2:

When this is done, the ranks for each condition or sample are summed- evaluate stats

To use the Kruskal-Wallis test, the data must be of at least __ scaling.

ordinal

there must be at least __ scores in each sample to use the probabilities given in the table for Chi-square.

five