LECTURE 2 linear regression Flashcards
what is regression
way of predicting one variable from another - hypothetical model of the linear relationship between two variables
equation of a straight line
outcome (y) = [model] + error
y = b0 + b1X1 + error
b0 = intercept value when x=0 (crosses y)
b1 = regression coefficient for predictor - gradient and direction of relationship
fitting the model - method of least squares
method of least squares tries to minimize error within a model by providing a line of best fit - difference between data points and line
regression line may not reflect reality so must be tested for fitting data
define sum of squares
data points compared to their own group means
does nto account for much variance as not against overall grand mean (null hyp)
define total sum of squares (SSt)
total variability within data according to all points against the grand mean - subtract each data from grand mean to give idea of total variance in the data
define model sum of squares (SSm)
how the data deviates from the grand mean - deviations between the grand mean and the regression model
define residual sum of squares (SSr)
whatever variance is left unaccounted for - deviations of the data from the regression model line (SSt - SSm)
How do you test the regression model
is the regression a better reflection of the data than the grand mean (null)
if so - SSm > SSr
ANOVA f value
measure of the mean squared error (averages of the sum of square values)
MSm/MSr = F
want model to account for more variance than error/chance
ANOVA r2 value
what proportion of the variance can be accounted for by the regression model - use pearsons correlation coefficient
r2 = SSm(variance model accounts for)/SSt(all variance)
why use a histogram of standardised residuals
check for outliers - +- 3SD from mean (large residual means mismatch between what is observed and what is predicted)
check if normally distributed and therefore meets assumption of regression analysis
problem with regression
not symmeric - regressing Y on X not the same as then regression X on Y - CANT FLIP THE EQUATION