Simple Regression Flashcards
linear regression
used when the relationship between two variables can be described with a straight line
- proposes a model of the relationship
correlation vs regression
- correlation determines strength of relationship between X and y
- regression allows us to estimate how much Y will change as a result of a given change in X
terminology in regression
- regression distinguishes between variable being predicted and variable(s) used to predict
variable being predicted: y
- outcome variable
- DV (only ever one)
- criterion variable
- verticle axis
variable used to predict: x
- predictor variable
- IV(s)
- explanatory variable
- horizontal axis
when might we use regression
- to investigate strength of effect x has on y
- estimate how much y will change as a result of a given change in x
- predict future value of y based on x
what does regression assume + what does it not tell us
- y is dependent (to some extent) on x
- regression doesn’t tell us if this dependency is causal
3 stages of linear regression
- analysing the relationship between variables: strength and direction (correlation)
- proposing a model to explain that relationship: model is a line of best fit
- evaluating the model: assessing goodness of fit
regression line
(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x
evaluating the model; simplest model vs best model
simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y
best model:
- based on relationship between x and y
- regression line
sum of squares total
the difference between observed values of y and the mean of y
- variance in y not explained by simplest model
- not required to perform in exam
sum of squares residual
the difference between the observed values of y and those predicted by the regression line
- variance in y not explained by regression model
- not required to perform in exam
difference between SST and SSR
reflects improvement in prediction using the regression model compared to simplest mode
- goodness-of-fit
- sum of squares of the model
- not required to perform in exam
the larger the SSm…
… the bigger the improvement in prediction using the regression model over the simplest model
final thing in goodness-of-fit test
- use ANOVA for F-test to evaluate the improvement due to the model (SSm), relative to the variance the model does not explain (SSr)
- ANOVA uses mean square values instead of SS
- this takes d.f. into account
- provides f-ratio
F-ratio
measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
interpreting F-ratio
- if regression model is good at predicting y (relative to simplest model) the improvement in prediction of the model (MSm) will be larger, while the level of accuracy of the model (MSr) will be small
e.g. F value further from 0
H0 when assessing goodness of fit
regression model and simplest model are equal (in terms of predicting y)
MSm = 0
p < .05 reject H0, regression model is better for the data than simplest model
note of SS
you never need to calculate it by hand
regression equation
y = bx + a
a-intercept
b-slop
y = predicted value of y
linear regression assumptions
- linearity: x and y must be linearly related
- absence of outliers (should be removed)
- normality, linearity and homoscedasticity, independece of residuals
- NO PARAMETRIC EQUIVALENT
homoscedasticity of residuals
variance of residuals about the outcome should be the same for all predicted scores
SPSS output for regression
in model summary
- don’t need this in write-up
ANOVA SPPS output for regression
F = MSm / MSr
if p < .05 it is significant improvement when using regression model vs simplest model
SPSS Coefficient table
gives us elements for regression equation
beta: as standard deviation units (others as normal units e.g. £)
SPSS coefficient table outputs: t-test
- t-test tests the null hypothesis that value of b is 0
- provides us CIs for slope which we need in write up simple regression)
how is r^2 calculated
= SSm/SSt
- (multiple r^2 x100 for a percentage)
- in regression we use this to assume that x explains the variance in y
e.g. distance traveled explains a significant amount of variance in taxi fair, F…P… R^2 = .814 or distance traveled explained 81% of variance in taxi fair
square root of r^2
= r
IF WE ONLY HAVE ONE PREDICTOR
(remember we will lose the sign)
how do we calculate variance not explained by model
1 - R^2
write up
no design
- results in text
- we conducted a linear regression to examine the influence of Y on X. Mean Y (SD, CIs)(from descriptive stats at top of output) and mean X (SD, CIs).
- preliminary analysis confirmed no violation of normality, linearity or homoscedasticity assumptions
- Y explained/ did not explain a significant/not significant amount of variance in X, F(,) = __.__, p < .__, R^2 = __. (ANOVA table for F and p, R in model summary table)
- for every (1 unit e.g. mile) increase in Y (e.e.g journey), X (taxi fair) increased by (slope) (coefficients table), 95% confidence interval limits for slope were [,] (coefficients table)
simple regression discussion
the findings suggets that X can be predicted by Y, with longer/shorter/higher/lower Y resulting in higher/lower X