ANOVA Flashcards
What does ANOVA stand for, and what is its purpose?
ANalysis Of VAriance (ANOVA) tests for differences in means across three or more groups.
Why shouldnโt we use multiple t-tests for comparing more than two groups?
Multiple t-tests increase the Type I error rate, raising the probability of falsely rejecting the null hypothesis.
If there are 5 groups, how many pairwise comparisons would there be?
10
What is a Type I error?
Rejecting the null hypothesis (๐ป0) when it is actually true.
Probability = ๐ผ (significance level, e.g., 0.05).
What is a Type II error?
Failing to reject the null hypothesis (๐ป0) when it is actually false.
Probability = ๐ฝ.
How does performing multiple t-tests increase the Type I error rate?
For ๐ comparisons, the probability of at least one Type I error is:1โ(1โฮฑ)^k
How does the Bonferroni correction control the Type I error rate?
Divide the significance level by the number of comparisons:
๐ผ๐=๐ผ/๐
What are the null and alternative hypotheses for ANOVA?
H0:ฮผA=ฮผB=ฮผC (means are equal)
๐ป1: at least one mean is different.
What does the ANOVA test statistic (๐น) compare?
F= Variabilitybetweengroups / Variabilitywithingroups
โ
What is the formula for the between-group variance (๐ ๐บ2)?
๐ ๐บ2 = sum of squares between SSG / (k-1)
What is the formula for the within-group variance (๐ ๐2)?
๐ ๐2 = sum of squares within (SSW) / (ntot -k)
ntot = total sample size,
๐ = number of groups.
What does a large F-statistic indicate in ANOVA?
A large F-statistic means between-group variability is large compared to within-group variability, suggesting significant differences between group means.
How do you interpret the ANOVA table?
Between groups: Large Mean Square indicates group differences.
Residuals (within groups): Represents random variability.
Large ๐น-value + small ๐-value = reject ๐ป0
What degrees of freedom are used in ANOVA?
df1=kโ1 (between groups)
๐๐2=๐๐ก๐๐กโ๐ (within groups)
What does a small p-value in ANOVA indicate?
It indicates strong evidence that at least one group mean is significantly different.
What is the test statistic in ANOVA?
The F-statistic: SG2 / SW2
If the ANOVA test is significant, what should you do next?
Perform a post-hoc test (like Tukeyโs test) to identify which groups differ.
What assumptions does ANOVA rely on?
Normality of data within groups.
Homogeneity of variances across groups.
Independence of observations.
What should you do if ANOVA assumptions are violated?
Use a non-parametric alternative like the Kruskal-Wallis test.
Given ๐น0=5.263, ๐๐1=2, ๐๐2=31499df2=31499, and a critical value of 2.996, should you reject ๐ป0?
Yes, reject ๐ป0 because:
๐น0=5.263 > 2.996 (critical value)
๐-value=0.00518 < 0.01.
There is strong evidence that at least one mean is different.
What does the ๐-value of 0.00518 mean in this context?
It is highly unlikely to obtain an F-statistic this extreme (๐น0=5.263) if ๐ป0 (all means are equal) were true.
After rejecting ๐ป0, what should you do next?
Perform a post-hoc test like Tukeyโs HSD to determine which groups are significantly different.
Why is the adjusted ๐-value (p adj) used in Tukeyโs test?
It controls for the family-wise error rate by adjusting the significance level for multiple comparisons.
What is the family-wise error rate, and why is it important?
It is the overall probability of making at least one Type I error when conducting multiple comparisons. It is controlled by adjusting ๐ผ
What are the key assumptions for a valid ANOVA test?
Independence of data within and between groups.
Equal variances across groups (homogeneity of variance).
Normal distribution of data within groups.
How can you check the assumption of equal variances?
compare the standard deviation
Rule of thumb: the largest ๐ should be no more than twice the smallest.
Alternatively, use Leveneโs test in R
How do you test for normality in ANOVA?
Plot histograms or Q-Q plots.
Perform a Shapiro-Wilk test in R.
What non-parametric test can be used if the ANOVA assumptions are violated?
Kruskal-Wallis test (for non-normal data).
It assumes:
Independent data.
Similar shape of distributions across groups.
Why is practical significance different from statistical significance?
Statistical significance means a result is unlikely due to chance, while practical significance considers if the difference is meaningful in real-world terms.
What are the steps in hypothesis testing?
State ๐ป0 and ๐ป1
Collect data.
Select a test (e.g., ANOVA, t-test, etc.).
Set ๐ผ (e.g., 0.05).
Find the reference distribution (e.g., F-distribution).
Calculate test-statistic & p-value.
Make a decision (reject/fail to reject ๐ป0).
Interpret the result.
Why is the F-distribution used in ANOVA?
It compares the variance between groups to the variance within groups to test if the means differ.
What does a large F-statistic imply?
Large differences between group means relative to the within-group variability, suggesting significant group differences.
What does failing to reject ๐ป0 mean?
There isnโt enough evidence to conclude that any group means are different.