6. Regression assumptions, Diagnostics and Influencial cases Flashcards
How many assumptions are there for multiple linear regression?
9 mathematical
+2 design
=11
What are the 2 design assumptions of multiple linear regression?
Independence (each participant only 1 score on each IV)
Interval Scale on IV and DV (or dichotomous IV)
What are the 9 mathematical assumptions of multiple linear regression
Normality (6 sub assumptions)
No multicollinearity (3 ways to check) Linearity
Normal distribution of residuals
Independent Residuals
Residuals unrelated to predictors
Homogeneity of Variance
What are the 6 tests of normality?
Symmetry Modality Skew Kurtosis Outliers Shapiro-Wilk
What do you check with the assumption of symmetry?
Mean = Medium = Mode
What do you check for in modality?
Only 1 most frequently occurring score (Unimodal not multi/bimodal)
What do you check for in skew and kurtosis?
Skew / SE Skew
What constitutes an outlier?
95% of cases should be 1.96
No more than 3% of cases should be >2.58
If there are… they are outliers
What do you check for in the shaprio-wilk statistic?
That it is not sig. (>.05)
What are the 3 checks for multicollinearity?
Pearson Correlations .01
VIF
What does VIF stand for?
Variance inflation factor
Where and what for do you look to check whether the residuals have a normal distribution?
Mean of residuals = 0 No skew (Snaking) and No Kurtosis (Sag) in the P-P plot and histogram No outliers in the histogram
Why are the residual statistics so important?
Because if they aren’t normally distributed then we can’t say that 68% of cases will fall within + or -
the RMSE of the regression line
How do we check linearity and why do we check it?
Using pearson correlation
because if the IV is not related to the DV then it can’t be a good predictor
How is the Independence of Residuals tested?
Using the Durbin-Watson
What does the Durbin Watson show?
The independence of residuals
When reading the Durbin-Watson, what are we looking for to meet our assumption of independent residuals?
Values between 1.5-2.5
Actual range from 1 = strong pos. to 4= strong neg.