Regression Diagnostics & Assumptions Flashcards
When checking bias in regression models what are the 3 general things to check?
The question
Aspects to investigate in results
The general procedure
What questions regarding the model is important to investigate in regression?
Is the model accurate for the sample and can the model be generalized?
What are the important aspects to be investigated regarding results?
Outliers and distribution of residuals
What general procedure is important to check regarding bias in regression?
Regression diagnostics and assumption assessment
What is an outlier?
A case that differs substantially from the main trend of the data
Why can an outlier constitute a problem?
An outlier can affect the precision of the estimation of the regression coefficient
How can we detect outliers?
By searching for large residuals and also by searching for influential cases
How can you tell if an outlier has a small or large residual?
By looking at how close the outlier is to the line of best fit. Close is a small residual and far away is a large residual.
Large residual outliers can be found by looking at the standard normal distribution. What are the general rules?
Standardized residuals below -3 and +3 is a cause for concern because in a typical sample they are unlikely to occur.
Of more than 5% of values have a residual either below or above -2 or +2 we should be concerned because that exceeds what we would normally expect.
What table can you use to look at outliers with large residuals?
Casewise diagnostics
How can you detect influential cases?
By looking at Cooks’s distance in the residual statistics table.
What is the maximum number for cook’s distance before we should be concerned?
- Greater than 1 is a cause for concern.
What happens if we find outliers?
You check if - the outliers is not due to entry error if data entry is correct you can -transform data - consider deleting the case
What are the assumptions about residuals in regression?
Normality
Linearity
Homoscedasticity
What is the assumption about normality in assumptions about residuals in regression?
The residuals should be normally distributed