Let's say we had good reason to believe that sleep deprivation would impact performance but did not know at exactly what level of sleep deprivation this would occur. So, we had no specific hypothesis about what difference would emerge between which conditions. In this case, planned comparisons would not be appropriate Here you would perform the overall F analysis first If overall F is significant, we need to perform post-hoc tests to determine where the differences actually are

Week 7 - planned comparision and post hoc tests Flashcards by SOPHIE FOLEY

Why does the F ratio not paint the whole picture?

only tells there is a difference somewhere between the means We need an analysis that helps to determine where the difference(s) are

How well did you know this?

Not at all

Perfectly

What are the two basic approaches to comparisions?

A priori (or planned) comparisons
Post hoc comparisons

How well did you know this?

Not at all

Perfectly

What is a priori (or planned) comparisons

If we have a strong theoretical interest in certain groups and have evidence-based specific hypothesis regarding these groups then we can test these differences upfront
Come up with these before you do your study
Seek to compare only groups of interest
No real need to do the overall ANOVA we do it because of tradition. Hence, reports often start with the F test and progress to planned comparisons

rather be in a prior hypothesis then be in post hoc

How well did you know this?

Not at all

Perfectly

What is a Post hoc comparisons?

If you cannot predict exactly which means will differ then you should do the overall ANOVA first to see if the IV has an effect, then
Post hoc comparisons (post hoc = after the fact/ANOVA)
seek to compare all groups to each other to explore differences.
Less refined – more exploratory.

How well did you know this?

Not at all

Perfectly

What are the two types of A priori/ Planned comparisons

Simple

Complex

How well did you know this?

Not at all

Perfectly

What is a simple a priori comparison?

comparing one group to just one other group

How well did you know this?

Not at all

Perfectly

What is a complex a priori comparison?

comparing a set of groups to another set of groups

*In SPSS we create complex comparisons by
assigning weights to different groups

How well did you know this?

Not at all

Perfectly

How to conduct a priori comparison (how to weight it)

Create 2 sets of weights

1 for the first set of means
1 for the second set of means
Assign a weight of zero to any remaining groups
Set 1 gets positive weights
Set 2 gets negative weights
They must sum to 0

A simple rule that always works –> The weight for each group is equal to the number of groups in the other set

How well did you know this?

Not at all

Perfectly

What are the assumptions of a priori/ planned comparisons

Planned comparisons are subject to the same assumptions as the overall ANOVA - particularly homogeneity of variance as we use pooled error term.
Fortunately, when SPSS runs the t-tests for our contrasts it gives us the output for homogeneity assumed and homogeneity not assumed
If homogeneity is not assumed SPSS adjusts the df of our F critical to control for any
inflation of TYPE 1error

How well did you know this?

Not at all

Perfectly

What are Orthogonal contrasts

One particularly useful kind of contrast analysis is where each of the contrasts tests something completely different to the other contrasts

Principle:
Once you have compared one group (e.g., A) with another (e.g., B) you don’t compare
them again.

Example
Groups 1,2,3,4
Contrast 1 = 1,2 vs 3,4
Contrast 2 = 1 vs 2
Contrast 3 = 3 vs 4

How well did you know this?

Not at all

Perfectly

Cool things about orthogonal contrasts

A set of k-1 orthogonal contrasts (where k is the number of groups) accounts for all of the differences between groups
According to some authors a set of k-1 planned contrasts can be performed without adjusting for type-I error rate

How well did you know this?

Not at all

Perfectly

Post-Hoc comparisons

Let’s say we had good reason to believe that sleep deprivation would impact performance but did not know at exactly what level of sleep deprivation this would occur. So, we had no specific hypothesis about what difference would emerge between which conditions.
In this case, planned comparisons would not be appropriate
Here you would perform the overall F analysis first
If overall F is significant, we need to perform post-hoc tests to determine where the differences actually are

How well did you know this?

Not at all

Perfectly

What do post hoc comparisions seek to compare?

Post-hoc tests seek to compare all possible combinations of
means
* This will lead to many pair-wise comparisons
* e.g., With 4 groups, 6 comparisons
* 1v2, 1v3, 1v4, 2v3, 2v4, 3v4

How well did you know this?

Not at all

Perfectly

How does post hoc comparisions increase the risk of type 1 errors?

So, as we know when we find a significant difference there is an alpha chance that we have made a Type I error.
The more tests we conduct the greater the Type I error rate

How well did you know this?

Not at all

Perfectly

What is the error rate per experiment (PE)

the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
* The PE error rate <=  x number of tests
* <= means it could be as high as that value

How well did you know this?

Not at all

Perfectly

How do you restore a type 1 error rate back to .05 (5%) when conducting multiple tests

Study These Flashcards

So when we need to conduct several tests, what should we do about the rising Type I error rate?
* If many tests is required, then a Bonferroni Adjusted
apha level may be used

What is a Bonferroni adjustment?

Study These Flashcards

Divide  by the number of tests to be conducted (e.g., .05/2 = .025 if 2 tests are to be conducted).
Assess each follow up test using this new  level (i.e. .025)
Maintains PE error at .05 But this will reduce the power of your comparisons a lot!

Remember as we decrease alpha (by making our test more conservative) we also decrease power (chances of detecting a true effect)

What are alternatives to the Bonferroni test (alternatives to reducing type 1 error rate)

Study These Flashcards

There are several statistical tests that systematically compare all means whilst controlling for Type 1 error
LSD - least significant difference (actually no adjustment) (where you just ignore the problem -> not recommended)
Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (ie Type 1 V Type 2 error)
Newman-Keuls: gives more power but less stringent control of EW error rate
Scheffe Test most stringent control of EW error rate as controls for all possible simple and complex contrasts
And many others you can find out about at your leisure

What is the best one of these tests to use?

Study These Flashcards

Tukey’s test is very common and recommended.

What to do with post hoc tests (when do you use them and how)

Study These Flashcards

If your hypothesis predicts specific differences between means;
Assess assumptions
Perform ANOVA
Consider what comparisons will test your specific hypotheses
Perform planned comparisons needed to test these predictions
If your hypothesis does not predict specific differences between means;
Assess assumptions
Perform ANOVA
If ANOVA is significant then perform post-hoc tests
If ANOVA is not significant then don’t do post-hoc tests

What is a meta analysis?

Study These Flashcards

When a researcher finds a lot of papers in the literature about a specific topic then you take their individual statistics and put it in a spreadsheet and then you aggregates these statistics and do a statistical test on this aggregated data

Effect size philosophy

Study These Flashcards

A significant F simply tells us that there is a difference between means. I.e., that the IV has had some effect on the DV
* It does not tell us how big this difference is.

It does not tell us how important this effect is.
An F significant at .01 does not necessarily imply a bigger or more important effect than an F significant at .05.
The significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution

What does effect size tell us?

Study These Flashcards

If I took the overall variability in my criterion variable (example here target accuracy) How much of that variability could I explain on the basis of how much sleep deprivation you’ve had?

summarizes the strength of the treatment effect:

Eta squared (n2)
Indicates the proportion of the total variability in the data accounted for by the effect of the IV.

what does n2 tell us

Study These Flashcards

This result says that __% of the variability in errors is due to the effect of manipulating whatever our IV is

For example, one could say that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.

What are the limitations of n2 (eta squared) given by SPSS

* It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population * It tends to be an overestimate of the effect size in the population

Criteria for assessing eta squared

Cohen (1977) proposed the following scale for effect size; * .01 = small effect (1%) * .06 = medium effect 6%) * >.14 = large effect (14%)

Interpreting effect size

* The effect sizes typically observed in psychology may vary from area to area. * The levels of the IV used are important in determining the observed effect size. * A theoretically important IV may still only account for a small proportion of the variability in the data. * A theoretically unimportant IV may account for a large proportion of variability in the data

What is power?

* Sensitivity is the ability of an experiment to detect a treatment effect when one actually exists. * Power is a quantitative index of sensitivity which tells us the probability that our experiment will detect this effect.

What is the ideal power

*Keppel (1992) argues that ideally power should be > .80. to ensure an experiment can pick up a moderate effect. * Ensuring adequate power is a research design issue

Power is a function of...? (what are the things you can tweak/change that will change your experiments overall power?)

1. The size of the treatment effect (we have better power to detect stronger effects) 2. The size of the error variance (the more noise in the data, the harder to detect an effect) 3. The alpha level (the more conservative the test, the greater chances you will reject the H1 even when it is true) 4. Sample size * Greater sample size – greater power

How do you know what the power of your experiment is

power of your experiment is 1 minus the type 2 error rate

Why does sample size effect the amount of power you have?

When the df is larger (so larger N and more people) then there becomes less overall error so F becomes larger

Why not use the largest n possible?

* Not always cheap or easy to use large samples, * We need to know what is the acceptable minimum sample size to pick up a specific effect

What are the two main situations we are concerned about power in?

When we do not find a significant effect but there is evidence that we may have made a Type II error. * When we are planning a new experiment and wish to ensure that we have adequate power to pick up the effect of our IV

Power and sample size

* Ideally, we should determine the sample size that will give our experiment adequate power (> .8 ) before we run it. * Conducting a study without an indication of the sample required to achieve desired power may be an expensive waste of time

What are the ways to estimate the required sample size.

To do this you need an estimate of the magnitude of the treatment effect * You can get this either from: * past research * a pilot study * an estimate of the minimum difference between means that you consider relevant or important (often used in clinical experiments) .

Week 7 - planned comparision and post hoc tests Flashcards

(37 cards)