Regression Diagnostics & Assumptions Flashcards
When checking bias in regression models what are the 3 general things to check?
The question
Aspects to investigate in results
The general procedure
What questions regarding the model is important to investigate in regression?
Is the model accurate for the sample and can the model be generalized?
What are the important aspects to be investigated regarding results?
Outliers and distribution of residuals
What general procedure is important to check regarding bias in regression?
Regression diagnostics and assumption assessment
What is an outlier?
A case that differs substantially from the main trend of the data
Why can an outlier constitute a problem?
An outlier can affect the precision of the estimation of the regression coefficient
How can we detect outliers?
By searching for large residuals and also by searching for influential cases
How can you tell if an outlier has a small or large residual?
By looking at how close the outlier is to the line of best fit. Close is a small residual and far away is a large residual.
Large residual outliers can be found by looking at the standard normal distribution. What are the general rules?
Standardized residuals below -3 and +3 is a cause for concern because in a typical sample they are unlikely to occur.
Of more than 5% of values have a residual either below or above -2 or +2 we should be concerned because that exceeds what we would normally expect.
What table can you use to look at outliers with large residuals?
Casewise diagnostics
How can you detect influential cases?
By looking at Cooks’s distance in the residual statistics table.
What is the maximum number for cook’s distance before we should be concerned?
- Greater than 1 is a cause for concern.
What happens if we find outliers?
You check if - the outliers is not due to entry error if data entry is correct you can -transform data - consider deleting the case
What are the assumptions about residuals in regression?
Normality
Linearity
Homoscedasticity
What is the assumption about normality in assumptions about residuals in regression?
The residuals should be normally distributed
What are the assumption about linearity?
The residuals should have a straight line relationship with the predicted outcome scores
What is homoscedasticity?
The variance of the residuals about the predicted outcome scores should be approximately equal for all predicted scores.
How do you check for normality, linearity and homoscedasticity?
Look at the scatterplot. The scatterplot should be roughly rectangularly distributed, with most points centered around the center (point 0).
What is an alternative way of checking for normality of residuals?
A histogram.
What is independence of error?
For any two observations, the error of prediction (residuals)should be uncorrelated (it should be independent from one another).
That is, the error produced by two observations should not be due to the same reason.
What do you check for independence of error?
Look at the model summary table and see the Durban Watson.
What numbers should the Durbin Watson numbers lie within?
1 and 3 is good. If it’s closer to 0 or to 4 then it’s a cause for concern.
What does the Durbin Watson test check?
If the adjacent residuals are correlated.
Numbers between 0-2 indicates positive correlation and between 2-4 indicates a negative correlation.
A number closer to 2 means lack of correlation.
What is multicollinesrity?
In multiple regression, it occurs when the predictor variable in the models are highly correlated (>.80).
It means that the predictors have a lot of shared variance.
What in multicollinearity a problem?
When you are interested in the separated effects of different predictors on an outcome but not when looking at the outcome from a set of predictors.
How do you check for multicollinearity?
Inspecting correlations in the correlation matrix that the correlation is not great than .80.
What is the solution to multicollinearity?
Delete one variable or more from the model (if correlation is greater than .90)
Combine the collinear variables into a composite variable (if the correlation between the 2 variables are >.80).