9: REGRESSION Flashcards

Question 1

Q

linear regression

Answer

A

used when the relationship between variables x and y can be described with a straight line

correlation determines the strength of a relationship between x and y (doesn’t tell us how much y changes based on a given change in x)

regression allows us to estimate how much y will change as a result of a given change in x

Question 2

Q

terminology (regression): variables x and y

Answer

A

regression distinguishes between the variable being predicted and the variable(s) used to predict (in simple linear regression, only one predictor variable)

predicted variable : y
- the outcome variable
- the dependent variable
- the criterion variable

predictor variable : x
- the predictor variable
- the independent variable
- the explanatory variable

Question 3

Q

uses of linear regression (interpretation)

Answer

A

researchers might use regression to
- investigate the strength of the effect x has on y
- estimate how much y will change as a result of a given change in x
- predict a future value of y, based on a known value of x

unlike correlation, regression makes the assumption that y is (to some extent) dependent on x
- this may not reflect causal dependency

regression does NOT provide direct evidence of causality

Question 4

Q

stages of linear regression

Answer

A

analysing the relationship between variables
- determining the strength and direction of relationship (correlation)
proposing a model to explain that relationship
- a line of best fit
- find a (the intercept), b (the gradient)
evaluating the model
- assessing the goodness of fit

Question 5

Q

linear regression: 3 evaluating the model: goodness of fit

Answer

A

simplest model:
- no relationship between x and y (b = 0)

best model:
- based on relationship between x and y
- the regression line

is our regression model better at predicting y than the simplest model?

Question 6

Q

linear regression: calculating goodness of fit

Answer

A

SSt: the difference between the observed values of y and the mean of y (i.e. the variance in y not explained by the simplest model, b = 0)
SSr: the difference between the observed values of y and those predicted by the regression line (i.e. the variance in y not explained by the regression model)
the difference between SSt and SSr reflects the improvement in prediction using the regression model compared to the simplest model (i.e. the reduction in unexplained variance using the regression model compared to the simplest model) - SSt - SSr = SSm
the larger SSm, the bigger the improvement in prediction using the regression model over the simplest model

Question 7

Q

linear regression: assessing the goodness of fit

Answer

A

we can use an F-test (i.e. ANOVA) to evaluate the improvement due to the model (SSm) relative to variance the model does not explain (SSr)
rather than using the Sums of Squares (SS) values, the F-test uses Mean Squares (MS) values - takes the degrees of freedom into account
F ratio provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

F = MSm / MSr

Question 8

Q

linear regression: interpreting goodness of fit

Answer

A

if the regression model is good, MSm will be large, while MSr will be small (i.e. F value further away from 0)
null hypothesis: the regression model and the simplest model are equal (MSm = 0)
p expresses the probability of finding an improvement of the magnitude we have obtained (or larger), when the null is true
a significant result suggests regression model provides a better fit than the simplest model

Question 9

Q

linear regression: assumptions

Answer

A

linearity: x and y must be linearly related
absence of outliers
normality of residuals: residuals should be normally distributed around the predicted outcome
homoscedasticity: variance of residuals about the outcome should be the same for all predicted scores

Dancey and Reidy state that the outcome variable should be normally distributed, but this is a simplification

no parametric equivalent - can only attempt a ‘fix’

Question 10

Q

linear regression: Assumptions: Normal P-P Plot of Regression Standardized Residual

Answer

A

ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
- would suggest no major deviations from normality

Question 11

Q

linear regression: Assumptions: Scatterplot of Regression Standardized Residual

Answer

A

ideally, residuals will be roughly rectangularly distributed, with most scores concentrated in the centre (0)
- dont want to see systematic pattern to residuals

outliers: standardised residuals >3.3 or < -3.3

Question 12

Q

linear regression: SPSS coefficients (location)

Answer

A

coefficients table
- intercept (a) - B(constant)
- slope (b) - Bvariable
- standardised b value - beta*variable

t statistic tests the null that the value of b is 0

Question 13

Q

estimating variance explained

Answer

A

R^2: the amount of variance in y explained by model relative to total variance in y (R^2 = SSm/SSt)
can express R^2 as a percentage (x100)
r^2 expresses the proportion of shared variance between 2 variables

in regression we assume x explains the variance in y
- though r^2 = R^2 if we only have 1 predictor

variance not explained by x = (1-R^2)

Question 14

Q

multiple regression

Answer

A

allows us to assess the influence of several predictor variables (x1,x2 …) on y
we obtain a measure of how much variance in the outcome variabe (y) the predictor variables combined explain (by incorporating a model which incorporates the slopes of each predictor variable)
we also obtain measures of how much variance in the outcome variable our predictor variables explain when considered separately

Question 15

Q

multiple regression: assumptions (sample size)

Answer

A

sufficient sample size
advice:
- combined effect of several predictors: N > (or equal) 50 + 8M (e.g., for 3 predictors, at least 74 Ps)
- separate effect of several predictors: N > (or equal) 104 + M (e.g. for 3 predictors, at least 107 Ps)

too few participants may result in over optimistic results

Question 16

Q

multiple regression: all assumptions

Answer

Study These Flashcards

A

sufficient sample size
absence of outliers
multicollinearity: ideally, predictor variables will be correlated with the outcome variable but not one another
normality, linearity and homoscedasticity, independence of residuals

Question 17

Q

multiple regression: assumption: multicollinearity

Answer

Study These Flashcards

A

Ideally, predictor variables will be correlated with the outcome variable but not with one another
Check the correlation matrix before performing the regression analysis
Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing
it may be appropriate to combine the correlated predictor variables, or to remove one

Question 18

Q

multiple regression: Assumptions: Normal P-P Plot of Regression Standardized Residual

Answer

Study These Flashcards

A

ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
- would suggest no major deviations from normality

Question 19

Q

mulltiple regression: Assumptions: Scatterplot of Regression Standardized Residual

Answer

Study These Flashcards

A

ideally, residuals will be roughly rectangularly distributed, with most scores concentrated in the centre (0)
- dont wan’t to see systematic pattern

outliers: standardised residuals: >3.3 or <-3.3

9: REGRESSION Flashcards

(19 cards)