9. Assumptions and Diagnostics Flashcards
What are the different linear model assumptions?
Linearity
Independence of Errors
Normality of Errors
Equal Variance
What happens if assumptions are violated?
Model can’t be accurate
Why is it useful visualise assumptions?
Easier to see nature and magnitude of assumption violation
What are some drawbacks of using statistical methods of assessing assumptions?
Suggests assumptions are violated when they are actually small
This is due to statistical power and give no info on actual problem
What is linearity?
Assumes X & Y are linear
What happens if we estimate a linear relation when there isn’t one present?
Can result in underestimating that relation
How is linearity investigated?
Investigated via scatterplots with loess lines (single variance)
or
Component residual’s plots (when we have multiple predictors) - Closer to black line = More linear
How do we test for non-linearity?
Need to know relations between each predictor and outcome are linear by controlling other predictors
Partial residuals for x value
ei + BjXij (partial linear relation xj + y)
What are normally distributed errors?
Assumes error are normally distributed about each predicted value
How are normally distributed errors investigated?
Investigated via QQ plots (Quantile comparisons plots)
- Plot standardized residuals from model against theoretical expected values
- If normally distributed = Should fall neatly on diagonal plot
- If non-normally distributed = Impact shape
- Can also use histograms to see distribution
What is equal variance (Homoscedasticity) ?
Assumes variance is constant across values of the predictors x1, … xk + across values of the fitted values y
What is it called when homoscedasticity is violated?
Heteroscedasticity
How is equal variance investigated?
Using residual plot
When comparing residual values vs predicted values = Should be the same difference above the line and below the line
Categorical predictors = Should show similar spread
Continuous predictors = Should be dots that follow the line closely
What is independence of errors?
Assumes errors are not correlated with one another
How do we test independence of errors?
Difficult to test, unless we know the potential source of correlation between cases
Errors are not correlated in between persons design
Can use a variant of linear model to account for independence