Simple Regression Flashcards
linear regression
used when the relationship between two variables can be described with a straight line
- proposes a model of the relationship
correlation vs regression
- correlation determines strength of relationship between X and y
- regression allows us to estimate how much Y will change as a result of a given change in X
terminology in regression
- regression distinguishes between variable being predicted and variable(s) used to predict
variable being predicted: y
- outcome variable
- DV (only ever one)
- criterion variable
- verticle axis
variable used to predict: x
- predictor variable
- IV(s)
- explanatory variable
- horizontal axis
when might we use regression
- to investigate strength of effect x has on y
- estimate how much y will change as a result of a given change in x
- predict future value of y based on x
what does regression assume + what does it not tell us
- y is dependent (to some extent) on x
- regression doesn’t tell us if this dependency is causal
3 stages of linear regression
- analysing the relationship between variables: strength and direction (correlation)
- proposing a model to explain that relationship: model is a line of best fit
- evaluating the model: assessing goodness of fit
regression line
(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x
evaluating the model; simplest model vs best model
simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y
best model:
- based on relationship between x and y
- regression line
sum of squares total
the difference between observed values of y and the mean of y
- variance in y not explained by simplest model
- not required to perform in exam
sum of squares residual
the difference between the observed values of y and those predicted by the regression line
- variance in y not explained by regression model
- not required to perform in exam
difference between SST and SSR
reflects improvement in prediction using the regression model compared to simplest mode
- goodness-of-fit
- sum of squares of the model
- not required to perform in exam
the larger the SSm…
… the bigger the improvement in prediction using the regression model over the simplest model
final thing in goodness-of-fit test
- use ANOVA for F-test to evaluate the improvement due to the model (SSm), relative to the variance the model does not explain (SSr)
- ANOVA uses mean square values instead of SS
- this takes d.f. into account
- provides f-ratio