W8 multiple regression assumptions Flashcards
Regression assumptions
Normality
- residuals (errors) are normally distributed with mean = 0
- often said, centred around 0 (zero)
Homoscedasticity
- constant variance of residuals (across predicted scores)
Independence of errors
- The residuals are uncorrelated with Y
Linearity of the relationship
Predicted and actual score
We predict the actual (Y) score from the Xk predictors—what Y “should” be
A predicted score (Y′) is derived from this
Unless the correlation is perfect, Y ≠ Y′,
Y - Y′ = e
e = error/residual
= whatever leftover in Y that Y′ doesn’t account for
Y = Y′ + e
Y = Predicted + Residual
Positive or negative residual
If the regression equation has underestimated the actual score, the residual will be positive
If instead the true score was overestimated, the residual will be negative
Why residuals matter
The residuals can take on distinctive patterns if there is something systematic and amiss.
If there were only true score and random error in our data and we hadn’t omitted any important predictors of our outcome, we would expect nice neat, normally distributed residuals centred around zero…
-> Anything else means that there might be a problem
Homoscedascity assumption
-> When we have our residuals correlated with predicted scores
-> And we have random error (i.e., error is independent of association with the criterion)
-> And we haven’t left out important predictors
-> We should have a rectangular shape
We have met the assumption of homoscedascity
*look up image
Analysing homoscedascity
- The residuals should be evenly scattered above and below zero
- The range of residuals around zero should be narrow, the larger the range, the worse the prediction
- If however our residuals look like a funnel or a fan then this suggests that we have not met the assumption of homoscedascity
- > The distribution of residuals across the range of predicted values of Y is not even
What not meeting homoscedascity suggests
- Such a pattern suggests that there is worse prediction at one point of predicted values of Y than at another point of predicted values of Y
- Can indicate skew in one or more predictors, but not always
How to address violations of homoscedascity
We may find one or more outlying data points on one or more variables that are unduly influencing the regression analysis and leading to potentially erroneous conclusions
OR
One or more predictors may deviate greatly from normality—skewness and kurtosis. Again, this may lead to errors.
Homoscedascity and deviation from normality
One or more predictors may deviate greatly from normality—skewness and kurtosis. Again, this may lead to errors.
a) In this case, we generally apply one or more transformations of the data (in turn, not concurrently).
b) We replace the actual variable with the transformed one in the regression.
c) Certain transformations may help alleviate skewness (caution, hit and miss!)