Regression Diagnostics and Assumptions Flashcards

Question 1

Q

Describe the data type assumptions of regression

Answer

A

DV must be interval/ratio
Predictors must be interval/ratio or only have 2 responses
Can use categorical predictors but have to be recoded into dichotomous dummy variables.

Question 2

Q

What do you do to assess whether the regression model is accurate for the sample?

Answer

A

Perform regression diagnostics

Look at outliers

Question 3

Q

What do you do to check if the regression model can be generalised?

Answer

A

Assumptions assessment

Look at distribution of residuals

Question 4

Q

Outliers

Answer

A

A case that differs substantially from the main trend of the data.

Problematic as can affect the precision of the estimation of the regression coefficient

Spotted by conducting regression diagnostics

Question 5

Q

What are the approaches to regression diagnostics?

Answer

A

Searching for large residuals. Good to use a scatter plot.

Searching for influential cases

Question 6

Q

What should you do if you find a small residual?

Answer

A

Can be hugely influential so spot by searching for influential cases which is different to large residuals

Question 7

Q

How do you detect outliers by searching for large residuals?

Answer

A

Look at the standardised residuals (turned into z-scores) for individual cases as this makes interpretation of their size of measurement easier.

Standardised residuals below -3.0 and above +3.0 are a cause for concern.

Should also be concerned if more than 5% of values have a residual belo -2.0 or above +2.0 because that exceeds what we would normally expect.

Look at casewise diagnostics in SPSS. Any participant with a large residual is displayed.

Question 8

Q

How do you detect outliers by searching for influential cases?

Answer

A

Look at Cook’s distance in residuals statistics. This concerns how much predicted scores for other cases would differ if the case in question were not included.

Cooks distance should not exceed 1.

Question 9

Q

What should you do if you find outliers?

Answer

A

Ensure that outliers aren’t due to data entry error.
Transform data (usually not a good idea)
Consider deleting the case responsible for the outlier, but only if it produces a very large distortion. In doubt, report results for samples with and without outlier.

Question 10

Q

Explain the assumptions about residuals in regression

Answer

A

Normality - Residuals should be normally distributed. Histogram, scatterplot.

Linearity - Residuals should have a straight line relationship with predicted outcome scores. Look at scatterplots.

Homoscedasticity - Residuals should be equally distributed across regression line.

Question 11

Q

Explain the assumption of independence of error

Answer

A

For any two observations, the errors of prediction (residuals) should be uncorrelated. Error produced by two observations should not be due to the same reason.

Look at the Durbin-Watson index, tests if adjacent residuals are correlated. Varies between 0 and 4 with 2 meaning a lack of correlation.
- 2+ negative correlation
- -2 = positive correlation
Size depends on numb of predictors and numb of observations.

Question 12

Q

What should you do if you violate a regression assumption?

Answer

A

Re-run regression using bootstrapping.

Question 13

Q

Multicollinearity

Answer

A

Occurs when two predictors are very strongly correlated.

Check by looking at correlation output to check correlations are not too high, .80+

High m suggests that the variables are measuring the same thing. Best to remove if have high correlations. Always a good idea to inspect.

Not a problem when the purpose of research is to predict the outcome variable from a set of predictors. A problem when you are interested in the separated effects of different predictors on an outcome.

Question 14

Q

Tipping Effect

Answer

A

Concern multicollinearity

When two predictors differ slightly in their bivariate relationship with the outcome, they may end up differing greatly in their regression coefficients.

Question 15

Q

What are the solutions to multicollinearity?

Answer

A

Delete one or more variables from the model. Can be done if correlations between two predictors are very high e.g. .90
Combine the collinear variables into a composite variable. Good when correlation around .80.

Regression Diagnostics and Assumptions Flashcards

(15 cards)