Week 5: Comparing Means - One-way ANOVA Flashcards
What does ANOVA stand for?
Analysis of Variance
What
What is the decision tree for choosing a one-way ANOVA? - (5)
Q: What sort of measurement? A: Continuous
Q:How many predictor variables? A: One
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: More than two
Q: Same or Different participants for each predictor level? A: Different
When does ANOVA be used?
if you are comparing more than 2 groups in IV
Example of ANOVA RQ
Which is the fastest animal in a maze experiment - cats, dogs or rats?
We can’t do three separate t-tests for example what is the fastest animal in a maze experiment - cats, dogs or rats as - (2)
Doing separate t-tests inflates the type I error (false positive - e.g., pregnant man)
The repetition of the multiple tests adds multiple chances of error, which may result in a larger α error level than the pre-set α level - Family wise error
What is familywise or experimentwise error rate?
This error rate across statistical tests conducted on the same experimental data
Family wise error is related to
type 1 error
What is the alpha level probability
probability of making a wrong decision in accepting the alternate hypothesis = type 1 error
If we conduct 3 separate t-tests to test the comparison of which is the fastest animal in experiment - cats, dogs or rats with alpha level of 0.05 - (4)
- 5% of type 1 error of falsely rejecting H0
- Probability of no. of Type 1 errors is 95% for a single test
- However, for multiple tests the probability of type 1 error decreases as 3 tests together => 0.950.950.95 = 0.857
- This means probability of a type 1 error increases: 1- 0.857 = 0.143 (14.3% of not making a type 1 error)
Much like model for t-tests we can write a general linear model for
ANOVA - 3 levels of categorical variable with dummy variables
When we perform a t-test, we test the hypothesis that the two samples have the same
mean
ANOVA tells us whether three or more means are the same so tests H0 that
all group means are equal
An ANOVA produces an
F statistic or F ratio
The F ratio produced in ANOVA is similar to t-statistic in a way that it compares the
amount of systematic variance in data to the amount of unsystematic variance i.e., ratio of model to its error
ANOVA is an omnibus test which means it tests for and tells us - (2)
overall experimental effect
tells whether experimental manipulation was successful
An ANOVA is omnibus test and its F ratio does not provide specific informaiton about which
groups were affected due to experimental manipulation
Just like t-test can be represented by linear regression equation, ANOVA can be represented by a
multiple regression equation for three means and models acocunt for 3 levels of categorical variable with dummy variables
As compared to independent samples t-test that compares means of two groups, one-way ANOVA compares means of
3 or more independent groups
In one-way ANOVA we use … … to test assumption of equal variances across groups
Levene’s test
What does this one-way ANOVA output show?
Leven’s test is non-significant so equal variances are assumed
What does this SPSS output show in one-way ANOVA?
F(2,42) = 5.94, p = 0.005, eta-squared = 0.22
How is effect size (eta-squared) calculated in one-way ANOVA?
Between groups sum of squares divided by total sum of squares
What is the eta-squared/effect size for this SPSS output and what does this value mean? - (2)
830.207/3763.632 = 0.22
22% of the variance in exam scores is accounted for by the model
Interpreting eta-squared, what does 0.01, 0.06 and 0.14 eta-sqaured in one way ANOVA means? - (3)
- 0.01 = small effect
- 0.06 = medium effect
- 0.14 = large effect
What happens if the Levene’s test is significant in the one-way ANOVA?
then use statistics in Welch or Brown-Forsythe test
The Welch or Brown-Forsythe test make adjustements to DF which affects
in one way ANOVA if Levene’s test is sig
statistics you get and affect if p value is sig or not
What does this post-hoc table of Bonferroni tests show in one-way ANOVA ? - (3)
- Full sleep vs partial sleep, p = 1.00, not sig
- Full sleep vs no sleep , p = 0.007 so sig
- Partial sleep vs no sleep = p = 0.032 so sig
Diagram of example of grand mean
Mean of all scores regardless pp’s condition
What are the total sum of squares (SST) in one-way ANOVA?
difference of the participant’s score from the grand mean squared and summed over all participants
What is model sum of squares (SSM) in one-way ANOVA?
difference of the model score from the grand mean squared and summed over all participants
What is residual sum of squares (SSR) in one-way ANOVA?
difference of the participant’s score from the model score squared and summed over all participants
The residuals sum of squares (SSR) tells us how much of the variation cannot be
explained by the model and amount of variation caused by extraneous factors
We divide each sum of squares by its
DF to calculate them
For SST its DF we divide by is in one-way ANOVA
N-1
For SSM its DF we divide by is one-way ANOVA so
number of group (parameters), k,
For SSM if we have three groups then its DF will be in one way ANOVA
3-1 = 2
For SSR we divivde by its DF to calculate which will be the in one way ANOVA
total sample size, N, minus the number of groups, k
Formulas of dividing each sum of squares by its DF to calculate it in one way ANOVA- (3)
- MST = SST (N-1)
- MSR = SSR (N-k)
- MSM = SSM/k
SSM tells us the total variation that the
exp manipulation explains
What does MSM represent?
average amount of variation explained by the model (e.g. the systematic variation),
What does MSR represent?
average amount of variation explained by extraneous variables (the unsystematic variation).
The F ratio in one-way ANOVA can be calculated by
If F ratio in one-way ANOVA is less than 1 then it represents a
non-significant effect
Why F less than 1 in one-way ANOVA represents a non-significant effect?
F ratio is less than 1 means that MSR is greater than MSM = more unsystematic than systematic
If F is greater than 1 in one-way ANOVA then shows likelhood … but doesn’t tell us - (2)
indicates that experimental manipulation had some effect above and beyond effect of individual differences in performance
Does not tell us whether F-ratio is large enough to not be a chance result
When F statistic is large in one-way ANOVA then it tells us that the
MSM is greater than MSR
To discover if F statistic is large enough not to be a chance result in one-way ANOVA then
compare the obtained value of F against the maximum value we would expect to get by chance if the group means were equal in an F-distribution with the same degrees
of freedom
High values of F are rare by in one way ANOVA are rare - (3)
by chance
. Low degrees of freedom result in long tails of the distribution, so much like other statistics
large values of F are more common to crop up by chance in studies with low numbers of participants.
The F-ratio tells us in one-way ANOVA whether model fitted to data accounts for more variation thane extraneous and does not tell us where
differences between groups lie
If F-ratio in one-way ANOVA is large enough to be statistically significant then we know
that one or more of the differences between means is statistically significant (e.g. either b2 or b1 i statistically significant)
It is necessary after conducting an one-way ANOVA to carry out further analysis to find out
which groups differ
The power of F statistic is relatively unaffected by
non-normality
when group sizes are not equal the accuracy of F is
affected by skew, and non-normality also affects the power of F in quite unpredictable ways
When group sizes are equal, the F statistic can be quite robust to
violations of normality
What tests do you do after performing a one-way ANOVA and finding significant F test? - (2)
- Planned contrasts
- Post-hoc tests
What do post-hoc tests do? - (2)
- compare all pairwise differences in mean
- Used if no specific hypotheses concerning differences has been made
What is the issue with post-hoc tests?
- because every pairwise combination is considered the type 1 error rate increases, so normally the type 1 error rate is reduced by modifying the critical value of p
Post-hoc tests are like two or one tailed hypothesis?
two-tailed
Planned contrasts are like one or two-tailed hypothesos?
One-tailed hypothesis
What is the most common modification of the critical value for p in post-hoc in one-way ANOVA?
Bonferroni correction, which divides the standard critical value of p=0.05 by the number of pairwise comparisons performed
Planned contrasts are used to investigate a specific
hypothesis
Planned contrasts do not test for every
pairwise difference so are not penalized as heavily as post hoc tests that do test for every difference
With planned contrasts test you dervivie the hypotheses before the
data is collected
Diagram of planned contrasts
Contrast 1 = Treatment vs control
Contrast 2 = Treatment 1 vs Treatment 2
In planned contrasts when one condition is used it is
never used again
In planned contrasts the number of independent contrasts you can make can be defined with one way ANOVA
k (number of groups) minus 1
How does planned contrasts work in SPSS?
Coefficients add to 0 for each contrast (-2 + 1 +1) and once group used alone in contrast then enxt contrasts set coefficient to 0 (e.g., -2 to 0)|
SPSS has a lot of contrasts that are inbult but helpful if
you know what these contrasts do before entering the data as depend on the order in which you coded your vairables
What are weights?
Values we assign to the dummy variables e.g., -2 in the box
One type of planned contrasts is a polynominal contrast which in one way ANOVA
tets for trends in data and in its most basic form looks for lienae treat (i.e., group means increase proportionately)
Polynominal contrasts can also look at more complex trends other than linear such as in one way ANOVA?
quadratic, cubic and quartic
What does a linear trend represent?
simply proportionate change in the value of the dependent variable across ordered categories
What is a quadartic trend?
one change in the direction of the line (e.g. the line is curved in one place)
What is a cubic trend?
two changes in the direction of the trend
What is a quartic trend?
has three changes of direction
The Bonferroni post-hoc ensures that the type 1 error is below in one-way ANOVA?
0.05
With Bonferroni correction it reduces type 1 (being conserative in type 1 error for each comparison) it also in one way ANOVA?
lacks statistical power (probability of type II error will be high [ false negative]) so increasing chance of missing a genuine difference in data
What post hoc-tests to use if you have equal sample sizes and confident that your group variances are similar? in one way ANOVA
Use REGWQ or Tukey as good power and tight control over Type 1 error rate
What post hoc tests to use if your sample sizes are slightly different in one way ANOVA?
Gabriel’s procedure because it has greater power,
What post-hoc tests to use if your sample sizes are very different? ine one way ANOVA?
if sample sizes are very different use
Hochberg’s GT2
What post-hoc test to run if Levene’s test of homeogenity of variance is significant in one way ANOVA?
Games-Howell
**
What post=hoc test to use if you want gurantee control over type 1 errror rate in one wau ANOVA?
Bonferroni
What does this ANOVA error line graph show? - (2)
- Linear trend as dose of Viagra increases so does mean level of libido
- Error bars overlap indicating no between group differences
What does the within groups gives deails of in ANOVA table?
SSR (unsystematci variation)
The between groups label in ANOVA table tells us
SSM (systematic variation)
What does this ANOVA table demonstrate? - (2)
- Linear trend is significant (p = 0.008)
- Quadratic trend is not significant (p = 0.612)
When we do planned contrasts we arrange the weights in such that we compare any group with a positive weight
with a negative weight
What does this output show if we conduct two planned comparisons of:
one to test whether the control group was different to the two groups which received Viagra, and one to see
whether the two doses of Viagra made a difference to libido
- (2)
the table of weights shows that contrast 1 compares the placebo group against the two experimental groups,
contrast 2 compares the low-dose group to the high-dose group
What does this table show if levene’s test is non significant =equal variances assumed
To test hypothesis that experimental groups would increase libido above the levels seen in the placebo group (one-tailed)
To test another hypothesis that a high dose of Viagra would increase libido significantly more than a low dose
one-way ANOVA
- (3)
Signifiance value given in table is two-tailed and since hypothesis one-tail we divide by 2
for contrast 1, we can say that taking Viagra significantly increased libido compared to the control group (p = .0029/2 = 0.0145)
. The significance of contrast 2 tells us that a high dose of Viagra increased libido significantly more than a low dose (p(one-tailed) = .065/2 = .0325)
If making a few pairwise comparisons and equal umber of pps in each condition then … if making a lot then use. in one way ANOVA - (2).
Bonferroni
Tukey
a
Assumptions of ANOVA - (5)
- Independence of data
- DV is continuous; IV categorical (3 groups)
- No significant outliers;
- DV approximately normally distributed for each category of the IV
- Homogenity of variance = Levene’s test not significant
Example of
ANOVA compares many means without increasing the chance of
type 1 error
In one-way ANOVA, we partiton the total variance into
IV and DV
Formula of effect size for one-way ANOVA
Formula for effect size of contrasts for one-way ANOVA - (4)
Less commonly, but no less importantly, we can report effect sizes for contrasts
It follows the same logic as the r2 , but in this case we can use a formula that uses the value of t, which is given when contrasts are tested
r2 = t2 / (t2 + df)
Whether we are computing the effect size for the model as a whole or for contrasts the same intuitive feature of the r2 statistic exists
- it shows what proportion of the variance is explained by the model
What happens if Levene’s test is significant , no homogenity of variance,
If it is significant there are ways to modify the F test to account for it
An independent t-test is used to test for:
A Differences between means of groups containing different participants when the sampling distribution is normal, the groups have equal variances and data are at least interval.
B Differences between means of groups containing different participants when the data are not normally distributed or have unequal variances.
C Differences between means of groups containing the same participants when the data are normally distributed, have equal variances and data are at least interval.
D Differences between means of groups containing the same participants when the sampling distribution is not normally distributed and the data do not have unequal variances.
A differences between means of groups containing different participants when sampling distribution is normal and the groups have equal variances and data are at least interva
If you use a piared samples t-test
A The same participants take part in both experimental conditions.
BThere ought to be less unsystematic variance compared to the independent t-test.
C Other things being equal, you do not need as many participants as you would for an independent samples design.
D All of these are correct.
D All of these are correct
Which of the following statements about the t distribution is correct?
A It is skewed
BIn small samples it is narrower than the normal distribution
CAs the degrees of freedom increase, the distribution becomes closer to normal
DIt follows an exponential curve
C As the DF increase, the distribution becomes closer to normal
Which of the following sentences is an accurate description of the standard error?
AIt is the same as the standard deviation
BIt is the observed difference between sample means minus the expected difference between population means (if the null hypothesis is true)
CIt is the standard deviation of the sampling distribution of a statistic
D It is the standard deviation squared
CIt is the standard deviation of the sampling distribution of a statistic
A psychologist was interested in whether there was a gender difference in the use of email. She hypothesized that because women are generally better communicators than men, they would spend longer using email than their male counterparts. To test this hypothesis, the researcher sat by the computers in her research methods laboratory and when someone started using email, she noted whether they were male or female and then timed how long they spent using email (in minutes). Based on the output, what should she report?
(NOTE: Check for the assumption of equality of variances).
A Females spent significantly longer using email than males, t(14) = –1.90, p = .079
BFemales and males did not significantly differ in the time spent using email,t(7.18) = –1.90,p= .099
CFemales and males did not significantly differ in the time spent using email, t(7.18) = –1.90, p < .003
DFemales and males did not significantly differ in the time spent using email, t(14) = –1.90, p = .079
BFemales and males did not significantly differ in the time spent using email,t(7.18) = –1.90,p= .099
Other things being equal, compared to the paired-samples (or dependent)t-test, the independentt-test:
A Has more power to find an effect.
BHas the same amount of power, the data are just collected differently.
CHas less power to find an effect.
D Is less robust.
CHas less power to find an effect.
Differences between group means can be characterized as a regression (linear) model if:
AThe outcome variable is categorical.
BThe groups have equal sample size.
CThe experimental groups are represented by a binary variable (i.e. code 1 and 0).
DThe difference between group means cannot be characterized as a llinear model, they must be analyzed as an independent t-test.
The experimental groups are represented by a binary variable (i.e. code 1 and 0)
An experiment was done to look at whether different relaxation techniques could predict sleep quality better than nothing. A sample of 400 participants were randomly allocated to one of four groups: massage, hot bath, reading or nothing. For one month each participant received one of these relaxation techniques for 30 minutes before going to bed each night. A special device was attached to the participant’s wrist that recorded their quality of sleep, providing them with a score out of 100. The outcome was the average quality of sleep score over the course of the month.
Which test could we use to analyse these data?
A Regression only
B ANOVA only
C Regression or ANOVA
D Chi-square
C (multiple) Regression or ANOVA (independent) as regression and ANOVA is the same
Did not mention the hypothesis of prediction or it would be regression
Chi-square only used when you have one categorical predictor and outcome is categorical
A researcher testing the effects of two treatments for anxiety computed a 95% confidence interval for the difference between the mean of treatment 1 and the mean of treatment 2. If this confidence interval includes the value of zero, then she cannot conclude that there is a significant difference in the treatment means: true or false.
TRUE OR FALSE
TRUE
The student welfare office was interested in trying to enhance students’ exam performance by investigating the effects of various interventions. They took five groups of students before their statistics exams and gave them one of five interventions: (1) a control group just sat in a room contemplating the task ahead; (2) the second group had a yoga class to relax them; (3) the third group were told they would get monetary rewards contingent upon the grade they received in the exam; (4) the fourth group were given beta-blockers to calm their nerves; and (5) the fifth group were encouraged to sit around winding each other up about how much revision they had/hadn’t done (a bit like what usually happens). The final percentage obtained in the exam was the dependent variable. Using the critical values for F, how would you report the result in the table below?
AType of intervention did not have a significant effect on levels of exam performance, F(4, 29) = 12.43, p > .05.
BType of intervention had a significant effect on levels of exam performance, F(4, 29) = 12.43, p < .01.
CType of intervention did not have a significant effect on levels of exam performance, F(4, 33) = 12.43, p > .01.
DType of intervention had a significant effect on levels of exam performance, F(4, 33) = 12.43, p < .01.
Type of intervention had a significant effect on levels of exam performance, F(4, 29) = 12.43, p < .01.
Imagine you compare the effectiveness of four different types of stimulant to keep you awake while revising statistics using a one-way ANOVA. The null hypothesis would be that all four treatments have the same effect on the mean time kept awake. How would you interpret the alternative hypothesis?
A. All four stimulants have different effects on the mean time spent awake
B, All stimulants will increase mean time spent awake compared to taking nothing
C. At least two of the stimulants will have different effects on the mean time spent awake
D, None of the above
C. At least two of the stimulants will have different effects on the mean time spent awake
When the between-groups variance is a lot larger than the within-groups variance, the F-value is ____ and the likelihood of such a result occurring because of sampling error is _____
A small; high
B small; low
C. large; high
D. large; low
D. large; low
Subsequent to obtaining a significant result from an exploratory one-way independent ANOVA, a researcher decided to conduct three post hoc t-tests to investigate where the differences between groups lie.
Which of the following statements is correct?
A. The researcher should accept as statistically significant tests with a probability value of less than 0.016 to avoid making a Type I error
B. The researcher should have conducted orthogonal contrasts instead of t-tests to avoid making a Type I error
C. This is the wrong method to use. The researcher did not make any predictions about which groups will differ before running the experiment, therefore contrasts and post hoc tests cannot be used
D. None of these options are correct
The researcher should accept as statistically significant tests with a probability value of less than 0.016 to avoid making a Type I error
A psychologist was looking at the effects of an intervention on depression levels. Three groups were used: waiting list control, treatment and post-treatment (a group who had had the treatment 6 months before). The SPSS output is below. Based on this output, what should the researcher report?
A. The treatment groups had a significant effect on depression levels,F(2, 45) = 5.11.
B. The treatment groups did not have a significant effect on the change in depression levels,F(2, 35.10) = 5.11.
C. The treatment groups did not have a significant effect on depression levels,F(2, 26.44) = 4.35.
D. The treatment groups had a significant effect on the depression levels,F(2, 26.44) = 4.35.
D. The treatment groups had a significant effect on the depression levels,F(2, 26.44) = 4.35.
Imagine we conduct a one-way independent ANOVA with four levels on our independent variable and obtain a significant result. Given that we had equal sample sizes, we did not make any predictions about which groups would differ before the experiment and we want guaranteed control over the Type I error rate, which would be the best test to investigate which groups differ?
A. Orthogonal contrasts
B. Helmert
C. Bonferroni
D. Hochberg’s GT2
C. Bonferroni
The student welfare office was interested in trying to enhance students’ exam performance by investigating the effects of various interventions.
They took five groups of students before their statistics exams and gave them one of five interventions: (1) a control group just sat in a room contemplating the task ahead (Control); (2) the second group had a yoga class to relax them (Yoga); (3) the third group were told they would get monetary rewards contingent upon the grade they received in the exam (Bribes); (4) the fourth group were given beta-blockers to calm their nerves (Beta-Blockers); and (5) the fifth group were encouraged to sit around winding each other up about how much revision they had/hadn’t done (You’re all going to fail).
The student welfare office made four predictions: (1) all interventions should be different from the control; (2) yoga, bribery and beta-blockers should lead to higher exam scores than panic; (3) yoga and bribery should have different effects than the beta-blocker drugs; and (4) yoga and bribery should also differ.
Which of the following planned contrasts (with the appropriate group codings) are correct to test these hypotheses?
ANSWER 1
ANSWER 2
ANSWER 3
ANSWER 4
ANSWER 1 - sum of all weights should be 0
Deciding what post hoc tests to run
Example of RQ for one way ANOVA - (3)
Is there a statistically significant difference in Frisbee throwing distance with respect to education status
IV = Education with 3 levels = high school, graduate, postgrad
DV = Frisbee throwing distance
What does this one-way ANOVA output show?
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
There was homogeneity of variance as assessed by Levene’s Test for Equality of Variances (F (2,47) = 1.94, p = .155)
What does the results of one-way ANOVA show?
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
There was a statistically significant difference between groups as demonstrated by one-way ANOVA (F(2, 47) = 3.50, p = .038).
What does the results of one-way ANOVA show? –> post hoc
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
A Tukey post hoc test shows that the PostGrad group was able to throw the frisbee statistically significantly further than the High School group (p = .034). There was no statistically significant difference between the Graduate and High School groups (p = . 691) nor between the Graduate and PostGrad groups (p = .099).
What is IV and DV of one -way ANOVA?
IV = 1 predicto Categorical with more than 2 levels
DV = 1 Continous
one-way ANOVA is also called
between subject