Intro to linear models Flashcards
what is a model?
a formal representation of a system - baso all statistics is about models.
we represent mathematical models as functions, giving it arguments and operations that allow us to make predictions (which can be tested)
what is a good model?
- model is represented as a function
- the function is represented as a line
- the line yields predictions we would expect if the model were true
- when we collect more data to test the predictions, it matches the model well
what is a linear model?
model of a linear relationship
we use linear models to try and explain variation in an outcome (DV) using one or more predictors (IVs)
what is the intercept?
point where our model line crosses y, when x = 0
what is the slope?
the gradient of the model line or rate of change
linear model equation:
yi = β0 + β1xi + ϵ
y = intercept + slope of x + residual
what is the residual?
a measure of how well the model fits each data point.
it is the distance (on the y axis) between the model line and each data point
residuals should be:
- normally distributed
- mean of 0
- sd of σ (means spread of the errors should constant)
what is a least square?
minimise the residual sum of squares
meaning they minimize the distance between the observed value of y and the model predicted value (^y)
residual sum of squares (SSresidual)
equation:
SS residual = (observed y value - model estimated y value) squared and then added up for all observations
minimising the SS residual means that our predicted values are as close as they can get to each of our observed values
calculating intercept
intercept = mean of y - slope estimate of the mean of x
(see slope calculations)
calculating slope
β1 = SPxy / SSx
slope = sum of cross products / sums of squared deviations of x
SPxy = sum of cross products = (observed x value - sample mean of x) * (observed y value - sample mean of y) added up for all observations
SSx = sums if squared deviations of x = (observed x value - sample mean of x) squared, then added up for all observations
example interpretation of a linear model for how hours of study effects test score:
intercept = value of y when x is 0 e.g. expected test score for a student who studied 0 hours
slope = change in y for a unit increase in x = expected increase (or decrease) in test score for every additional hour studied.
estimated sd of the error (residual) equation
^σ = square root of (SS residual / n-k-1)
what is multiple regression?
when a linear model has multiple predictors, the model finds the optimal prediction of the outcome for the multiple predictors, taking into account their redundancy (correlation) with one another
uses of multiple regression:
- prediction
- theory testing
- covariate control (assessing the effect of one predictor, controlling for the influence of the others)
multiple regression linear model equation:
yi = β0 + β1x1 + β2x2 + ϵ
for each additional predictor (x) we have an additional β coefficient
interpretation:
- β0 = the predicted y value when all Xs are 0
- β1 = partial regression coefficient = change in y for one unit change in x1 when all other Xs are held constant
what is meant by ‘holding constant’?
refers to the effect of the predictor when the values of all other predictors are fixed
3 ways to evaluate our linear model:
- evaluating the significance of individual effects
- evaluate the overall quality of the model
- evaluating model assumptions
evaluating the significance of individual effects
this is basically hypothesis testing. the steps are:
1. good research question
2. create hypothesis from the question (remember we only ever test our null hypothesis)
3. define the null (usually β1 = 0 as that means x has no effect on y)
4. choose significance level
5. calculate test statistic for β coefficient: t = ^β / SE(^β)
6. evaluate the t-statistic against the null (using p-values or critical values)
standard error of the slop - SE(^β) equation
SE(^β) = square root of (SSresidual / n-k-1) / (sum of (x - mean of x)^2 *multiple correlation coefficient of the predictors)
SE is smaller when:
- residual variance is smaller
- sample size is larger
- with less predictors
- when predictors are not correlated with other predictors (R^2xj)
confidence intervals for β coefficients (slope)
^β1 +/- t-stat * SE(^β)
to know if our variable is statistically significant, our null hypothesis value must not be contained within the confidence intervals (this is usually 0)
evaluating overall model quality
the aim of linear models is to explain y as a function of x -in reality x does not account for all the variance in y, leaving us with residual vairance
sums of squares
we can breakdown variation in our data based on sums of squares:
SStotal = SSmodel + SSresidual
this means total varaition in y = variation in our model + residual variance
coefficient of determination - R^2
R^2 qantifies the amount of variance in our model that is accounted for by our predictors. it is presented as a decimal/percentage and the more variability accounted for, the better
equations:
R^2 = SS model / SS total
R^2 = 1 - SSresidual/SStotal
total sum of squares (SStotal)
SStotal = (observed y value - mean of y) squared and then added up for all observations
SStotal is the squared distance of each data point from the mean of y (since the mean is our baseline/best guess of what y should be)
model sum of squares (SSmodel)
SSmodel = (model estimated y value - mean of y) squared and the added up for all observations
it measures the distance from the model predicted line to the line for mean of y
adjusted R^2
this is used when our linear model has more than 2 predictors as it adjusts for n and k. with more predictors there is more chance of random sampling fluctuation which has an effect on R^2
equation:
adjusted R^2 = 1 - (1 - R^2) * (n-1)/(n-k-1)
what is a F-test?
F-tests test the significance of the overall model as a whole by testing the significance of the F-ratio
what is the F-ratio?
the F-ratio is the ratio of explained to unexplained variance. F-ratio tests the null hypothesis that all regression slopes (model lines) will be 0 as this means our predictors tell us nothing about the outcome.
if our predictors do explain some variance, our F-ratio will be significant - bigger F-ratios indicate better models. When the null is true, the F-ratio will be close to 1 so we want it to be >1 so there is more model than residual variance
equation:
F = MSmodel / MSresidual
what are mean squares?
mean squares are sums of squares calculations divided by the associated degrees of freedom
residual degrees of freedom
= n - k - 1
as these are based off our model in which we estimate k beta terms ( = -k) and the intercept ( = -1)
total degrees of freedom
= n - 1
in order to estimate y, all but one value of y are free to vary, hence -1
model degrees of freedom
= k
as its dependent on our beta estimates hence k
evaluating F-ratios
an f-ratio is evaluated against an f-distribution using dfModel and dfResidual, our alpha level and our critcal values
what are degrees of freedom
the maximum number of logically independent values which have freedom to vary in the data sample
standardising β coefficients
standardising allows us to compare the effects of variables on arbitrary scales or scales with differing units -but be careful, for regression, unstandardised coefficients are often more useful
equation:
^β*i = ^βi * (Sx/Sy)
standardised β = estimated β * (sd of x / sd of y)
z-scoring β coefficients
method of standardising for continuous variables that transforms the IV and DV into z-scores (mean = 0, sd = 1) prior to fitting the model
equation
zx = xi - mean of x / Sx
translation = divide the deviation from the mean by the standard deviation (for both x and y values)
interpreting standardised regression coefficients
R^2, F-test and t-test remain the same
our beta coefficients will change
e.g. β0 = 0 when all variables are standardised
β1 = increase in y (in sd units) for every sd increase in x
standardised slope = correlation coefficient (r)
what are categorical variables
variables that can only take discrete values that are mutually exclusive. a binary variable is a type of categorical variable with only 2 levels.
what is dummy coding?
binary variables are coded as 0 and 1 and often referred to as dummy variables - when we have multiple dummies we use the general procedure of dummy coding.
one level is chosen as a baseline and all other levels are compared to this baseline. for any categorical variables we create k-1 dummy variables.
interpretation of dummy coding coefficients
β0 = expected value of y when x is 0 this is the mean of our baseline level
β1 = the predicted difference between the means of the two groups
this interpretation becomes more complicated as we get more predictors
steps in dummy coding:
- choose a baseline variable
- assign everyone in the baseline group 0 for all k-1 dummy variables
- assign everyone in the next group a 1 for the first dummy variable and 0 in all others
- repeat step 3 until all k-1 dummy variables have a 0 and 1 assigned
- enter the dummy variables into your regression
dummy coding results interpretation example:
test score based on 3 method of revising: re-read (baseline), summarise notes or self test
β0 = mean of re-reading
β1 = difference between mean of summarise and intercept
β2 = difference between mean of self test and intercept