Intro to linear models Flashcards
what is a model?
a formal representation of a system - baso all statistics is about models.
we represent mathematical models as functions, giving it arguments and operations that allow us to make predictions (which can be tested)
what is a good model?
- model is represented as a function
- the function is represented as a line
- the line yields predictions we would expect if the model were true
- when we collect more data to test the predictions, it matches the model well
what is a linear model?
model of a linear relationship
we use linear models to try and explain variation in an outcome (DV) using one or more predictors (IVs)
what is the intercept?
point where our model line crosses y, when x = 0
what is the slope?
the gradient of the model line or rate of change
linear model equation:
yi = β0 + β1xi + ϵ
y = intercept + slope of x + residual
what is the residual?
a measure of how well the model fits each data point.
it is the distance (on the y axis) between the model line and each data point
residuals should be:
- normally distributed
- mean of 0
- sd of σ (means spread of the errors should constant)
what is a least square?
minimise the residual sum of squares
meaning they minimize the distance between the observed value of y and the model predicted value (^y)
residual sum of squares (SSresidual)
equation:
SS residual = (observed y value - model estimated y value) squared and then added up for all observations
minimising the SS residual means that our predicted values are as close as they can get to each of our observed values
calculating intercept
intercept = mean of y - slope estimate of the mean of x
(see slope calculations)
calculating slope
β1 = SPxy / SSx
slope = sum of cross products / sums of squared deviations of x
SPxy = sum of cross products = (observed x value - sample mean of x) * (observed y value - sample mean of y) added up for all observations
SSx = sums if squared deviations of x = (observed x value - sample mean of x) squared, then added up for all observations
example interpretation of a linear model for how hours of study effects test score:
intercept = value of y when x is 0 e.g. expected test score for a student who studied 0 hours
slope = change in y for a unit increase in x = expected increase (or decrease) in test score for every additional hour studied.
estimated sd of the error (residual) equation
^σ = square root of (SS residual / n-k-1)
what is multiple regression?
when a linear model has multiple predictors, the model finds the optimal prediction of the outcome for the multiple predictors, taking into account their redundancy (correlation) with one another
uses of multiple regression:
- prediction
- theory testing
- covariate control (assessing the effect of one predictor, controlling for the influence of the others)
multiple regression linear model equation:
yi = β0 + β1x1 + β2x2 + ϵ
for each additional predictor (x) we have an additional β coefficient
interpretation:
- β0 = the predicted y value when all Xs are 0
- β1 = partial regression coefficient = change in y for one unit change in x1 when all other Xs are held constant
what is meant by ‘holding constant’?
refers to the effect of the predictor when the values of all other predictors are fixed