lecture 20 - the "General Linear Model" (GLM) Flashcards
Multiple appearances of F…
- Key test statistic for ANOVA was the F-ratio
- F = MSM/MSR (Mean Square for “model” aka the factor under examination divided by Mean Square “residual” aka the variance estimate without the factor or factors being examined).
- BUT – also saw this in the lecture on regression.
- F = MSM/MSR (Mean Square for “model” aka the regression line divided by Mean Square “residual” aka the variance estimate without the regression line).
- Is this like the way the z-distribution shows up in many places (e.g. as an approximation to Wilcoxon tests)?
- NO – F ratio in linear regression and ANOVA because these are “the same” (i.e. ANOVA IS a linear model).
That might seem odd to say – but we can explore this idea by taking a single set of data and analysing it different ways.
Analysing the same data several ways
- Imagine I have tested memory in two groups of people – one who used a visualisation technique, the other who did not.
- Group coded as 0 = no visualisation, 1 = visualisation.
- Memory scored on an interval scale.
- “Obvious” analysis is a between-subject t-test.
- If the assumptions of the t-test are met (of course), but let us say that they are.
- Also noted previously can analyse a 2-group study with ANOVA – and that should get the same result.
Where “same” is the same p-value, but t-test uses the t-statistic, ANOVA uses the F-ratio
memory and visualisation as t-test
So, let us ask SPSS to do a between-subject t-test.
* Output is below – clearly have a significant difference (p =
0.037, so probability of observing this value of t if the H0
was true is less than 0.05).
– As an aside, I deliberately set up the data to have identical variance
in each group simply to remove the “issue” of whether should
assume equal variance between groups or not.
memory and visualisation as ANOVA
Now, ask SPSS to do a between-subject ANOVA. Again, a
significant difference (p = 0.037, probability of observing
this value of F if the H0 was true is less than 0.05).
– Note, p-value is the same as for t-test; and F = t2.
– Nothing “special” here, just demonstrating that what I had said
before about the relationship between ANOVA and t-tests.
Memory & visualisation as regression?
- Started lecture by noting that ANOVA was an example of a linear model – and that it was no accident that F-ratios appeared when doing both ANOVA and linear regression.
- So, could we analyse this data with regression?
- Certainly seem to have the data needed.
- Memory score = outcome variable.
Visualisation group = predictor variable.
- Memory score = outcome variable.
Memory & visualisation as regression #2
- Not only does the model test part of the regression analysis show that the regression model is significant – but the p value (and F ratio) are the same as when analysing this data with ANOVA.
- Not just the same p-value, check the Sum of Squares, DFs, and Mean Squares – they are all the same…
That is, the “model” of the data in ANOVA IS the linear model discovered by regression!
- Not just the same p-value, check the Sum of Squares, DFs, and Mean Squares – they are all the same…
Regression vs ANOVA
- Previous demonstration was meant to reinforce the idea that ANOVA is an example of regression. That is, the “model” of the data in ANOVA is a linear model.
- But not all regression models “map” onto an ANOVA – only works when regression uses categorical predictors (i.e. the “groups” or “categories” in the ANOVA).
- This was why I coded the visualisation groups as 0 vs 1 in the previous example.
- I used a 2-group example here for simplicity, but the ANOVA/regression relationship is maintained for other types of ANOVA.
If more than 2 groups, the equivalent regression model will have multiple predictors (i.e. one for each group). But remember we don’t cover regression with multiple predictors on this course.
The GLM (General Linear Model)
- So – ANOVA can be described (and calculated) as linear regression with categorical predictors.
- Where “categorical predictors” are the groups/conditions in the ANOVA. As opposed to the continuous predictors that are perhaps most typically associated with regression.
- This idea that linear regression unifies multiple statistical approaches is called the GLM or General Linear Model.
- That is, the GLM is an overarching framework that brings multiple different approaches together.
- Note – it is not actually “fully general” – for example, it is explicitly about linear relationships (and as we have seen – sometimes the world throws up non-linear things).
The “Generalized” Linear Model addresses a lot of this, but like with non-linear regression, we won’t be covering it on this course.
What is the point of the GLM?
“Theory” answer(s)
- But why bother about the GLM?
- Some answers are “theoretical” – because this does not (yet) impact on how you do stats.
- Exploring the GLM helps illustrate the overall idea of null hypothesis testing as a process of comparing a “model” of the data to some null model (i.e. a model according to the null hypothesis).
So, the GLM helps unify what otherwise seem like very different approaches. Indeed, Andy Field (and many other textbook authors) use the GLM approach to explain statistical inference in a broad sense.
What is the point of the GLM?
Practical answer(s)
- But the GLM idea is not just “theoretical” – it is deeply practical. For example:
- (A) ANOVA is restricted in terms of data requirements
- E.g. for within-subject ANOVA need data from every subject in every condition; for between-subject ANOVA need data from separate subjects in every condition.
- But a regression-based approach can get around those requirements and allow analysis when we have only partial within-subject data and/or “missing” data.
(B) GLM naturally extends ANOVA in novel directions (e.g. ANCOVA – Analysis of Covariance – which is mixing categorical and continuous predictors).
History
- The idea that ANOVA and regression are “the same” is often a surprise to many people working in psychology.
- Perhaps because ANOVA and regression are often taught as separate things (I was certainly taught that way).
- Perhaps because ANOVA and regression were associated with different approaches to psychology.
- ANOVA with “experimental” approaches and regression with correlational or individual difference approaches.
See the following for some historical discussions of these issues if you like: Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70(6P1), 426-443; Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12(5), 671-684.
- ANOVA with “experimental” approaches and regression with correlational or individual difference approaches.