Week 7 - planned comparision and post hoc tests Flashcards
Why does the F ratio not paint the whole picture?
only tells there is a difference somewhere between the means We need an analysis that helps to determine where the difference(s) are
What are the two basic approaches to comparisions?
- A priori (or planned) comparisons
- Post hoc comparisons
What is a priori (or planned) comparisons
- If we have a strong theoretical interest in certain groups and have evidence-based specific hypothesis regarding these groups then we can test these differences upfront
- Come up with these before you do your study
- Seek to compare only groups of interest
- No real need to do the overall ANOVA we do it because of tradition. Hence, reports often start with the F test and progress to planned comparisons
rather be in a prior hypothesis then be in post hoc
What is a Post hoc comparisons?
- If you cannot predict exactly which means will differ then you should do the overall ANOVA first to see if the IV has an effect, then
- Post hoc comparisons (post hoc = after the fact/ANOVA)
- seek to compare all groups to each other to explore differences.
- Less refined – more exploratory.
What are the two types of A priori/ Planned comparisons
Simple
Complex
What is a simple a priori comparison?
comparing one group to just one other group
What is a complex a priori comparison?
comparing a set of groups to another set of groups
*In SPSS we create complex comparisons by
assigning weights to different groups
How to conduct a priori comparison (how to weight it)
Create 2 sets of weights
- 1 for the first set of means
- 1 for the second set of means
- Assign a weight of zero to any remaining groups
- Set 1 gets positive weights
- Set 2 gets negative weights
- They must sum to 0
A simple rule that always works –> The weight for each group is equal to the number of groups in the other set
What are the assumptions of a priori/ planned comparisons
- Planned comparisons are subject to the same assumptions as the overall ANOVA - particularly homogeneity of variance as we use pooled error term.
- Fortunately, when SPSS runs the t-tests for our contrasts it gives us the output for homogeneity assumed and homogeneity not assumed
- If homogeneity is not assumed SPSS adjusts the df of our F critical to control for any
inflation of TYPE 1error
What are Orthogonal contrasts
- One particularly useful kind of contrast analysis is where each of the contrasts tests something completely different to the other contrasts
Principle:
Once you have compared one group (e.g., A) with another (e.g., B) you don’t compare
them again.
Example
Groups 1,2,3,4
Contrast 1 = 1,2 vs 3,4
Contrast 2 = 1 vs 2
Contrast 3 = 3 vs 4
Cool things about orthogonal contrasts
- A set of k-1 orthogonal contrasts (where k is the number of groups) accounts for all of the differences between groups
- According to some authors a set of k-1 planned contrasts can be performed without adjusting for type-I error rate
Post-Hoc comparisons
- Let’s say we had good reason to believe that sleep deprivation would impact performance but did not know at exactly what level of sleep deprivation this would occur. So, we had no specific hypothesis about what difference would emerge between which conditions.
- In this case, planned comparisons would not be appropriate
- Here you would perform the overall F analysis first
- If overall F is significant, we need to perform post-hoc tests to determine where the differences actually are
What do post hoc comparisions seek to compare?
Post-hoc tests seek to compare all possible combinations of
means
* This will lead to many pair-wise comparisons
* e.g., With 4 groups, 6 comparisons
* 1v2, 1v3, 1v4, 2v3, 2v4, 3v4
How does post hoc comparisions increase the risk of type 1 errors?
- So, as we know when we find a significant difference there is an alpha chance that we have made a Type I error.
- The more tests we conduct the greater the Type I error rate
What is the error rate per experiment (PE)
the total number of Type 1 errors we are likely to make in conducting all the tests required in our experiment.
* The PE error rate <= x number of tests
* <= means it could be as high as that value
How do you restore a type 1 error rate back to .05 (5%) when conducting multiple tests
So when we need to conduct several tests, what should we do about the rising Type I error rate?
* If many tests is required, then a Bonferroni Adjusted
apha level may be used
What is a Bonferroni adjustment?
- Divide by the number of tests to be conducted (e.g., .05/2 = .025 if 2 tests are to be conducted).
- Assess each follow up test using this new level (i.e. .025)
- Maintains PE error at .05 But this will reduce the power of your comparisons a lot!
Remember as we decrease alpha (by making our test more conservative) we also decrease power (chances of detecting a true effect)
What are alternatives to the Bonferroni test (alternatives to reducing type 1 error rate)
- There are several statistical tests that systematically compare all means whilst controlling for Type 1 error
- LSD - least significant difference (actually no adjustment) (where you just ignore the problem -> not recommended)
- Tukey’s HSD - Honestly Significant Difference, popular as best balance between control of EW error rate and power (ie Type 1 V Type 2 error)
- Newman-Keuls: gives more power but less stringent control of EW error rate
- Scheffe Test most stringent control of EW error rate as controls for all possible simple and complex contrasts
- And many others you can find out about at your leisure
What is the best one of these tests to use?
Tukey’s test is very common and recommended.
What to do with post hoc tests (when do you use them and how)
- If your hypothesis predicts specific differences between means;
- Assess assumptions
- Perform ANOVA
- Consider what comparisons will test your specific hypotheses
- Perform planned comparisons needed to test these predictions
- If your hypothesis does not predict specific differences between means;
- Assess assumptions
- Perform ANOVA
- If ANOVA is significant then perform post-hoc tests
- If ANOVA is not significant then don’t do post-hoc tests
What is a meta analysis?
When a researcher finds a lot of papers in the literature about a specific topic then you take their individual statistics and put it in a spreadsheet and then you aggregates these statistics and do a statistical test on this aggregated data
Effect size philosophy
A significant F simply tells us that there is a difference between means. I.e., that the IV has had some effect on the DV
* It does not tell us how big this difference is.
- It does not tell us how important this effect is.
- An F significant at .01 does not necessarily imply a bigger or more important effect than an F significant at .05.
- The significance of F is dependent on the sample size and the number of conditions which determines the F comparison distribution
What does effect size tell us?
If I took the overall variability in my criterion variable (example here target accuracy) How much of that variability could I explain on the basis of how much sleep deprivation you’ve had?
summarizes the strength of the treatment effect:
- Eta squared (n2)
- Indicates the proportion of the total variability in the data accounted for by the effect of the IV.
what does n2 tell us
- This result says that __% of the variability in errors is due to the effect of manipulating whatever our IV is
For example, one could say that 65% of the variability in errors is due to the effect of manipulating sleep deprivation.
What are the limitations of n2 (eta squared) given by SPSS
- It is a descriptive statistic not an inferential statistic so not the best indicator of the effect size in population
- It tends to be an overestimate of the effect size in the population
Criteria for assessing eta squared
Cohen (1977) proposed the following scale for effect size;
* .01 = small effect (1%)
* .06 = medium effect 6%)
* >.14 = large effect (14%)
Interpreting effect size
- The effect sizes typically observed in psychology may vary from area to area.
- The levels of the IV used are important in determining the observed effect size.
- A theoretically important IV may still only account for a small proportion of the variability in the data.
- A theoretically unimportant IV may account for a large proportion of variability in the data
What is power?
- Sensitivity is the ability of an experiment to detect a treatment effect when one actually exists.
- Power is a quantitative index of sensitivity which tells us the probability that our experiment will detect this effect.
What is the ideal power
*Keppel (1992) argues that ideally power should be > .80. to ensure an experiment can pick up a moderate effect.
- Ensuring adequate power is a research design issue
Power is a function of…?
(what are the things you can tweak/change that will change your experiments overall power?)
- The size of the treatment effect (we have better power to detect stronger effects)
- The size of the error variance (the more noise in the data, the harder to detect an effect)
- The alpha level (the more conservative the test, the greater chances you will reject the H1 even when it is true)
- Sample size
* Greater sample size – greater power
How do you know what the power of your experiment is
power of your experiment is 1 minus the type 2 error rate
Why does sample size effect the amount of power you have?
When the df is larger (so larger N and more people) then there becomes less overall error so F becomes larger
Why not use the largest n possible?
- Not always cheap or easy to use large samples,
- We need to know what is the acceptable minimum sample size to pick up a specific effect
What are the two main situations we are concerned about power in?
When we do not find a significant effect but there is evidence that we may have made a Type II error.
- When we are planning a new experiment and wish to ensure that we have adequate power to pick up the effect of our IV
Power and sample size
- Ideally, we should determine the sample size that will give our experiment adequate power (> .8 ) before we run it.
- Conducting a study without an indication of the sample required to achieve desired power may be an expensive waste of time
What are the ways to estimate the required sample size.
To do this you need an estimate of the magnitude of the treatment effect
- You can get this either from:
- past research
- a pilot study
- an estimate of the minimum difference between means that you consider
relevant or important (often used in clinical experiments) .