Week 7 - linear models ANOVA Flashcards
what is it about?
comparing means of more than two groups
null
all group means are equal (variance among all groups = 0)
alternative
all group means are not equal. At least one mean differs from others.
Rejecting the null in ANOVA is evidence that the mean of at least one group is different from the others.
If null is false we expect the group mean squares to exceed the error mean square
If null is true, group and error mean squares should be close to 1.
test statistic
f ratio
F = MSgroup/MSerror
Variance among groups divided by variance within groups
If F is large, the more group means differ relative to spread within groups, and the more likely we are to reject the null hypothesis
if the null is true, F is ~1 (and only differs from 1 by chance)
One sided (greater or equal to zero)
Shape depends on degrees of freedom for groups and degrees of freedom for error
Partition the sum of squares to separate variance in the response variable into its different sources (groups and within groups)
Calculate an ANOVA table - divide sum of squares by degrees of freedom then divide mean squares by each other to get F
group mean square
proportional to the observed amount of variation among the group sample means. Represents the variation among subjects that belong to different groups. It will on average be similar to the error mean square if population means are equal
error mean square
estimates the variance among subjects that belong to the same group. The pooled sample variance, a measure of the variation among individuals within the same groups.
R squared
the fraction of the variation in that it is explained by different groups.
statistical model
Response variable = model + error
Y = µ + A + ε
Yij = µ + Ai + εij
linear models
Linear models - linear in its parameters (no parameter is an exponent or is multiplied or divided by another)
Usually use them to analyse numerical response variables that are assumed to have a normal distribution
Categorical variables (factors) or numerical
presenting results
ANOVA table
F = 3.89, df = (2, 58), P < 0.05
assumptions
The measurements in every group represent a random sample from the corresponding population
The variable is normally distributed in each of the k populations
The variance is the same in all k populations
robustness
f sample size is large, it can be large even if the variable does not have a normal distribution and even when the variance in k populations is not equal (samples must also be around the same size)
If they do not fit with assumptions, they can be transformed
nonparametric alternatives
If not normal
Kruskal-Wallis test
All group samples are random samples from corresponding populations
The distribution of the variable must have the same shape in every population to compare means or medians
planned comparisons
Planned comparison: is a comparison between means planned during the design of the study, identified before the data are examined.
Uses pooled sample variance (mean error squared) based on all k groups
Increases precision and power
Assumes that the variance is the same within all groups
unplanned comparisons
Unplanned comparison: is one of multiple comparisons such as between all pairs of means, carried out to help determine where differences between means lie.
More likely to get Type I errors
Data dredging is involved
Tukey-Kramer test