Simple Regression Flashcards

1
Q

linear regression

A

used when the relationship between two variables can be described with a straight line

  • proposes a model of the relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

correlation vs regression

A
  • correlation determines strength of relationship between X and y
  • regression allows us to estimate how much Y will change as a result of a given change in X
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

terminology in regression

A
  • regression distinguishes between variable being predicted and variable(s) used to predict
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

variable being predicted: y

A
  • outcome variable
  • DV (only ever one)
  • criterion variable
  • verticle axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

variable used to predict: x

A
  • predictor variable
  • IV(s)
  • explanatory variable
  • horizontal axis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

when might we use regression

A
  • to investigate strength of effect x has on y
  • estimate how much y will change as a result of a given change in x
  • predict future value of y based on x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does regression assume + what does it not tell us

A
  • y is dependent (to some extent) on x
  • regression doesn’t tell us if this dependency is causal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

3 stages of linear regression

A
  1. analysing the relationship between variables: strength and direction (correlation)
  2. proposing a model to explain that relationship: model is a line of best fit
  3. evaluating the model: assessing goodness of fit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regression line

A

(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

evaluating the model; simplest model vs best model

A

simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y

best model:
- based on relationship between x and y
- regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

sum of squares total

A

the difference between observed values of y and the mean of y

  • variance in y not explained by simplest model
  • not required to perform in exam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sum of squares residual

A

the difference between the observed values of y and those predicted by the regression line

  • variance in y not explained by regression model
  • not required to perform in exam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

difference between SST and SSR

A

reflects improvement in prediction using the regression model compared to simplest mode

  • goodness-of-fit
  • sum of squares of the model
  • not required to perform in exam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the larger the SSm…

A

… the bigger the improvement in prediction using the regression model over the simplest model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

final thing in goodness-of-fit test

A
  • use ANOVA for F-test to evaluate the improvement due to the model (SSm), relative to the variance the model does not explain (SSr)
  • ANOVA uses mean square values instead of SS
  • this takes d.f. into account
  • provides f-ratio
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

F-ratio

A

measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

17
Q

interpreting F-ratio

A
  • if regression model is good at predicting y (relative to simplest model) the improvement in prediction of the model (MSm) will be larger, while the level of accuracy of the model (MSr) will be small

e.g. F value further from 0

18
Q

H0 when assessing goodness of fit

A

regression model and simplest model are equal (in terms of predicting y)

MSm = 0
p < .05 reject H0, regression model is better for the data than simplest model

19
Q

note of SS

A

you never need to calculate it by hand

20
Q

regression equation

A

y = bx + a

a-intercept
b-slop

y = predicted value of y

21
Q

linear regression assumptions

A
  • linearity: x and y must be linearly related
  • absence of outliers (should be removed)
  • normality, linearity and homoscedasticity, independece of residuals
  • NO PARAMETRIC EQUIVALENT
22
Q

homoscedasticity of residuals

A

variance of residuals about the outcome should be the same for all predicted scores

23
Q

SPSS output for regression

A

in model summary
- don’t need this in write-up

24
Q

ANOVA SPPS output for regression

A

F = MSm / MSr

if p < .05 it is significant improvement when using regression model vs simplest model

25
Q

SPSS Coefficient table

A

gives us elements for regression equation

beta: as standard deviation units (others as normal units e.g. £)

26
Q

SPSS coefficient table outputs: t-test

A
  • t-test tests the null hypothesis that value of b is 0
  • provides us CIs for slope which we need in write up simple regression)
27
Q

how is r^2 calculated

A

= SSm/SSt
- (multiple r^2 x100 for a percentage)
- in regression we use this to assume that x explains the variance in y

e.g. distance traveled explains a significant amount of variance in taxi fair, F…P… R^2 = .814 or distance traveled explained 81% of variance in taxi fair

28
Q

square root of r^2

A

= r
IF WE ONLY HAVE ONE PREDICTOR
(remember we will lose the sign)

29
Q

how do we calculate variance not explained by model

A

1 - R^2

30
Q

write up

A

no design
- results in text
- we conducted a linear regression to examine the influence of Y on X. Mean Y (SD, CIs)(from descriptive stats at top of output) and mean X (SD, CIs).
- preliminary analysis confirmed no violation of normality, linearity or homoscedasticity assumptions
- Y explained/ did not explain a significant/not significant amount of variance in X, F(,) = __.__, p < .__, R^2 = __. (ANOVA table for F and p, R in model summary table)
- for every (1 unit e.g. mile) increase in Y (e.e.g journey), X (taxi fair) increased by (slope) (coefficients table), 95% confidence interval limits for slope were [,] (coefficients table)

31
Q

simple regression discussion

A

the findings suggets that X can be predicted by Y, with longer/shorter/higher/lower Y resulting in higher/lower X