General linear model Flashcards
General linear model
all about expressing the relationship between variables
For example…
What is the relation between a test score and the grouping variable
What is the relation between a pre-and-post-test measure
GLM Stat tests
T-tests
ANOVA, ANCOVA, MANOVA, MANCOVA
Correlations (pearson and spearman)
Linear regressions and multiple regressions
Goodness of fit test/chi squares
Machine learning and prediction models
GLM equation
using this equation, we can predict the outcome variable Ŷi for participant i, as long as we have the X value for participant i
Ŷi = b0 + b1xi + ei
Ŷ
Ŷ is the estimate of the observation outcome (Y) ie. represent the estimated DV
b0
b0 is the intercept of the regression line (where is crosses the axis)
B1
B1 is the the slope of the regression line
xi
xi is the observation of the predictor (X) ie represents the IV
ei
ei Is the residual error term, which is the difference between observed and predicted Y
i
i stands for the participant whose data is being used
Correlation
a standardized measure of the linear relation between two variables
X and Y are interchangeable
r-value
The correlation is represented by an r-value that can take any value between -1.00 to +1.00
The numerical value represents the shape of the correlation, the positive or negative represents the direction
A 1 is a perfect line, while a smaller value like 0.2 will be unfocused along a line
positive means when the IV increases so does the DV (and the oppositve for negative)
Common correlation interpretations
For absolute correlation values (positive or negative), common interpretation:
= 0.00 no relation, entirely random
0.01 to 0.30 weak
0.30 to 0.50 moderate
0.50 to 0.99 strong
= 1.00 perfect, identical
But these rules are arbitrary and should be based on the context of the study
Anscombe quartet
Idea that graphed data can look totally different but have the same summary statistics
Ordinary least squares regression
The general linear model tries to create or fit a line (line of best fit) through the datapoints that is as close as possible to every datapoint
This is done by minimizing the squared distance between the line and each point, which is why it is called “Ordinary least squares regression”
By default its estimates come out unstandardized ie. using the units of the original variables
Coefficients for ordinary least squares regression
For ordinary least squares regression: The estimated regression coefficients (b0 and b1) are the those that minimise the sum of the squared residuals.
Take the distance between a datapoint and the fitted line
Square that distance
Repeat for all datapoints, and sum up all these surfaces
Find the line where this combined surface is the smallest.