Chapter 12: General Linear Model, Comparing Several Different Means Flashcards
What does an ANOVA measure?
analysis of variance
when you are interested in comparing differences between the means in 3 or more independent groups
are population means the same or different?
want to know whats generally true about the groups, not too interested in the means of the sample (mostly mu)
Why wouldn’t we use multiple independent samples t-tests instead?
increased chance of type 1 error (declaring there is an effect when there really is not)
familywise alpha increases over .05
FW alpha is the conditional probability of making one or more type 1 errors when performing multiple tests
Are an ANOVA and regressions separate things?
No. Regression is a more general form of an ANOVA. Anything you can do with an ANOVA you can do with a regression.
Regression can handle both categorical and continuous predictors, while ANOVA can only handle categorical
the idea they are different is partially historical (regression = applied research; ANOVA = experimental)
What are the methods for comparing independent means?
to keep FW <.05/maintaining solid power
the more liberal the alpha, the greater the power
- Confidence intervals
- standard linear model w/dummy coding (comparing groups to a base condition)
- One way ANOVA
- Welch or Brown-Forsythe F (Fbf)
- Planned Contrasts
Confidence Intervals
Positives:
-simple/straightforward
-makes you think about magnitude/focuses on estimation and avoids b&w thinking
Negatives:
- risk finding a difference when one doesn’t exist (type 1 error), increased with sample size
- with a small number of groups, the CIs are not going to be as sensitive to differences that do exist
- not as sensitive/powerful as other analyses
- if CIs intersect too much, you can’t really draw conclusions
- error bars represent an estimate of where the population mean is
- larger bars = more uncertainty in the data. smaller number of cases means bigger error bars
interpretation: we are 95% confident that the population mean for X is between A & B
Standard linear model with dummy coding
comparing groups to base condition
involve the use of a regression model
useful when you want to compare groups to base/control group
use k-1 to make a base group (number of groups -1)
check the magnitude of R^2 change and whether that change is significant
negatives:
- compare groups only with the control group
- since they are both comparing against some standard (control), they are not independent tests
interest in mu difference, if so, how dif
R^2 change tells us group membership accounts for about X% of the variance in IV
a sif f change means the means are significantly different from eachother
In standard linear model with dummy coding on SPSS…
Start by looking at the f test. if it is significant, then look at the regression coefficients to see where those differences are. if it is not significant, you do not have to look at the coefficients.
allows you to see which means are different
Logic of F statistic
- tests overall fit of a linear model to a set of data
- when model is based on group means, our predictions from the model are group means
- different group means? good prediction, F will be high
- similar group means? not a good prediction, F will be low (~1), fail to reject null
compare improvement in fit due to using the model from the grand mean
if the differences between group means are large enough, then the resulting model will be a better fit to the data than the grand mean (null)
Linear Model: Overview/Summary
- model of “no effect” or “no relationship between the predicot and outcome” is one where the predicted value of the outcome is always the grand mean
- we can fit a different model to the data that represents an alternative hypothesis. we compare the fit of this model to the fit of the null (i.e., using the grand mean)
- the intercept and one or more parameters (b) describe the model
- the parameters determine the shape of the model thay we have fitted. therefore, the bigger the coefficients, the greater the deviation between the model and the null model (grand mean)
Group means: Overview/Summary
- in experimental research the parameters (b) represent the differences between group means
- if the differences between group means are large enough, then the resulting model will be a better fit to the data than the null model (grand mean)
- if this is the case: predicting scores from group membership better than simply using the grand mean. in other words, the group means are not all the same
calculating an f statistic
- quantify the amount of variability in the scores
- ## separate variability into 2 parts: the part that can be accounted for by group membership and the part that cannot be accounted for by group membership
more people = more residual
more groups = bigger model score
SStotal equation
total amount of variability in the scores
SSt = s^2grandmean(N-1)
s^2grand = (Xi-Xgrand)^2
square each difference then add them all together
SSmodel equation
how much variability accounted for by the model/group membership
SSm = n(Xgroupmean-Xgrandmean)^2
do for each group and add them all together
SSresidual equation
how much variability isn’t accounted for by the model/group membership
SSr = s^2group(n-1)
or
SSr = SSt - SSm
do for each group then add them together
n is number of people in each group
MSm equation
remove the effect of the # of groups/people
important because the SS get bigger the more groups/people you have
MSm = SSm/DFm
DFm = number of groups -1
k-1
MSr equation
remove the effect of the # of groups/people
important because the SS get bigger the more groups/people you have
MSr = SSr/DFr
DFr = total number of people - number of groups
N-k
F statistic equation
signal to noise
F = MSm/MSr
look at associated p. make a conclusion. “Given the null, the proability of obtaining an F of 5.12 of higher is .025”
Assumptions when comparing means
1) normality: assessed within individual group, not set of scores as a whole
2) independence: errors from each individual case are unrelated to eachother
***3) homogeneity: all comparison groups have the same variance
Homogeneity of variance importance/tests
- if group sizes are unequal, violationns of the assumption can have serious consequences (affects alpha and power) - not robust if these things are different
- Levene’s test or Brown-Forsythe F/Welch’s F, robust version of F that doesn’t assume homogeneity
Levene’s Test
not recommended to use
if Levene’s test is significant, then we conclude that the variances are significantly different and try to rectify the situation
if the sample size is small, this isn’t really powerful
if the sample size is large, even small violations will be statistically significant
Brown-Forsythe F and Welch’s F
control type 1 error rate, Welch’s has more power
not sure if the assumption of homogeneity is met
versions of the F statistic designe to be accurate when the assumption of homogeneity is violated
always use adjusted F. If its not violated, it’ll be same as unadjusted
estimate the amount/degree to which homogeneity is violated, then adjusts F in accordance/proportionally (F is lower)
Is ANOVA robust to violations of assumptions?
robust: if you violate assumptions, it doesn’t matter bc alpha and power only change a little
- yes, when group sizes are equal (based on old research)
- more recent research shows that is is more complicated, looks at more expanded conditions
- skewness, non-normality (heavy tailed distributions), and heteroscedasticity (not the same variance across groups) interact in complex ways, for ex:
- in the absence of normality, violations of homoscedasticity, alpha goes up to .18 from .05 (not good)
- set up power for .9, contaminates 10% of scores from a normal distribution with greater variance, power drops to .28
- scores are made to correlate moderately (r = .50), n = 10 per group, alpha = .74 from .05
Alternatives when assumptions aren’t met include:
1) Welch’s F: if heterogeneity has been violated; adjusts for the amount that the assumption has been violated (always use it)
2) robust tests: comparing different means. if assumptions are met, ANOVA has more power than robust tests
3) Kruskal-Wallis: nonparametric test that doesn’t make any assumption about distribution
Post Hoc Tests
used when we determine the means are different (sig ANOVA), but there is not a hypothesis on which groups are different
- follow up tests to an ANOVA to determine which groups are different
- kinda like p hacking, but not really because you control familywise alpha
- many post hoc tests, best one depends on the situation (keep FW alpha .05, allow power to be as high as possible) - dependent on assumptions and whether it is more important to not make Type 1 or Type 2 error.
- common ones: Least significant difference (LSD), Bonferroni, Tukey, REGWQ
perform badly when groups/variances are unequal
consist of pairwise comparisons that are designed to compare all different combinations of the tx groups (takes every pair of groups and performs a separate test on each
Types of post hoc tests, explained
1) LSD: no attempt to control type 1 error rate, like multiple t tests, but better because you start with an ANOVA
2) Bonferroni: when you absolutely have to control FW alpha, conservative, use when you really don’t want to make Type 1 error
3) Tukey: conservative like Bonferroni but more powerful when testing large number of means (large number of groups)
4) REGWQ: good power and control over type 1 errors, probably best when n’s (sample sizes) are equal
how do we know if results have scientific and/or practical significance?
quantify the effect size
significant result do not mean they are important. it just means we are able to rule out sampling error as. the sole cause of the observed effect in the sample
- R^2/ eta^2: total difference between the dif means and quantifying how big the effect is (overall effect across groups)
- omega^2: R^2 of the population’s overall effect. (Guidelines: .01 = S, .06 = M, .14 = L)
- rcontrast^2: standardized, specific to two groups (Guidelines: .1 = S, .3 = M, . 5 = L)
- cohen’s d: dif between 2 groups in SD units (Guidelines: .2 = S. .5 = M, .8 = L)
- mean differences
planned contrasts
used when you think you now which groups are different
enter variables into regression equation. if regression coefficient is significant, then theres a significant difference between the 2 groups in that contrast
involve use of regression model
fewer tests - FW alpha = .05 = more power