ANOVA Flashcards
what does ANOVA stand for?
analysis of variance
what is the purpose of ANOVA?
to compare the means of several groups using the variability within and between groups
what are the hypotheses in a one-way ANOVA?
H0 : µ1 = µ2 = … = µg
Ha : at least two population means are unequal
what are the assumptions for ANOVA?
- normal population distributions
- equal standard deviations across groups
- randomisation in sampling or assignment
how is variability partitioned in ANOVA?
between-groups variability: differences between group means
within-groups variability: differences within each group around its mean
what is the test statistic for ANOVA?
= between-groups sigma/within-groups sigma
how do you calculate the degrees of freedom in ANOVA?
between-groups: df1 = g - 1 (g = number of groups)
within-groups: df2 = N - g (N = total sample size)
what does the F distribution signify in ANOVA?
Mean ~ 1 when H0 is true
larger F value indicates stronger evidence against H0
how are the mean squares calculated in ANOVA?
mean square between (MSB): sum of squares between (SSB)/df1
mean square within (MSW): sum of squares within (SSW)/df2
how is total variability partitioned in ANOVA?
total SS = between-groups SS + within-groups SS
when is the F test robust to violations of assumptions?
- if sample sizes are equal or approximately equal
- when population distributions are approximately normal or have similar standard deviations
what should you check for extreme violations of assumptions?
box plots or dot plots for skewness or large differences in standard deviations
what does the residual standard deviation s represent in ANOVA?
it is the square root of the within-groups variance estimate or mean square error
how do you calculate the degrees of freedom for the error in ANOVA?
df2 = N - g
what does it mean if a confidence interval comparing two means does not include 0?
it indicates a significant difference between the population means
what should you do if the largest standard deviation is more than twice the smallest?
use a confidence interval formula with separate variances instead of pooled variance
how many pairwise comparisons are there for g groups in ANOVA?
g(g - 1)/2 = x comparisons
what is the main limitation of constructing multiple confidence intervals for mean differences?
the overall confidence level decreases as the number of comparisons increases
what is the boneferroni method?
a method that adjusts error probability for each comparison to ensure a high overall confidence level
how does the turkey method improve on the boneferroni method?
it provides narrower confidence intervals and maintains the desired overall confidence level
when is ANOVA robust to violations of normality?
when sample sizes are large, the normality assumption is less critical