Regression Diagnostics & Assumptions Flashcards

1
Q

When checking bias in regression models what are the 3 general things to check?

A

The question
Aspects to investigate in results
The general procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What questions regarding the model is important to investigate in regression?

A

Is the model accurate for the sample and can the model be generalized?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the important aspects to be investigated regarding results?

A

Outliers and distribution of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What general procedure is important to check regarding bias in regression?

A

Regression diagnostics and assumption assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an outlier?

A

A case that differs substantially from the main trend of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why can an outlier constitute a problem?

A

An outlier can affect the precision of the estimation of the regression coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can we detect outliers?

A

By searching for large residuals and also by searching for influential cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you tell if an outlier has a small or large residual?

A

By looking at how close the outlier is to the line of best fit. Close is a small residual and far away is a large residual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Large residual outliers can be found by looking at the standard normal distribution. What are the general rules?

A

Standardized residuals below -3 and +3 is a cause for concern because in a typical sample they are unlikely to occur.
Of more than 5% of values have a residual either below or above -2 or +2 we should be concerned because that exceeds what we would normally expect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What table can you use to look at outliers with large residuals?

A

Casewise diagnostics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you detect influential cases?

A

By looking at Cooks’s distance in the residual statistics table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the maximum number for cook’s distance before we should be concerned?

A
  1. Greater than 1 is a cause for concern.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens if we find outliers?

A
You check if 
- the outliers is not due to entry error
if data entry is correct you can 
-transform data
- consider deleting the case
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the assumptions about residuals in regression?

A

Normality
Linearity
Homoscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the assumption about normality in assumptions about residuals in regression?

A

The residuals should be normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the assumption about linearity?

A

The residuals should have a straight line relationship with the predicted outcome scores

17
Q

What is homoscedasticity?

A

The variance of the residuals about the predicted outcome scores should be approximately equal for all predicted scores.

18
Q

How do you check for normality, linearity and homoscedasticity?

A

Look at the scatterplot. The scatterplot should be roughly rectangularly distributed, with most points centered around the center (point 0).

19
Q

What is an alternative way of checking for normality of residuals?

A

A histogram.

20
Q

What is independence of error?

A

For any two observations, the error of prediction (residuals)should be uncorrelated (it should be independent from one another).
That is, the error produced by two observations should not be due to the same reason.

21
Q

What do you check for independence of error?

A

Look at the model summary table and see the Durban Watson.

22
Q

What numbers should the Durbin Watson numbers lie within?

A

1 and 3 is good. If it’s closer to 0 or to 4 then it’s a cause for concern.

23
Q

What does the Durbin Watson test check?

A

If the adjacent residuals are correlated.
Numbers between 0-2 indicates positive correlation and between 2-4 indicates a negative correlation.
A number closer to 2 means lack of correlation.

24
Q

What is multicollinesrity?

A

In multiple regression, it occurs when the predictor variable in the models are highly correlated (>.80).
It means that the predictors have a lot of shared variance.

25
Q

What in multicollinearity a problem?

A

When you are interested in the separated effects of different predictors on an outcome but not when looking at the outcome from a set of predictors.

26
Q

How do you check for multicollinearity?

A

Inspecting correlations in the correlation matrix that the correlation is not great than .80.

27
Q

What is the solution to multicollinearity?

A

Delete one variable or more from the model (if correlation is greater than .90)
Combine the collinear variables into a composite variable (if the correlation between the 2 variables are >.80).