case diagnostics Flashcards

Question 1

Q

what do regression/model outliers have?

Answer

A

a large residual E^i

Question 2

Q

E^i

Answer

A

discrepancy between predicted y value (y^i) and observed value (yi)

Question 3

Q

how to calculate standardised residuals

Answer

A

divide E^i by the estimate of the standard deviation of residuals, and convert the residuals to z-score units (this calculation includes the potential outlier)
computed by rstandard() function in r

Question 4

Q

how to calculate studentised residuals

Answer

A

divide E^i by the estimate of the standard deviation of the residuals excluding the case i
provides a version of standardised residuals excluding the outlier case
computed by rstudent() function in r

Question 5

Q

high leverage cases

Answer

A

cases with an unusual value of predictor (xi) or a combination of predictor values
have the potential to influence the B^0 (intercept) or B^1 (slope) of the regression model
can increase x variance

Question 6

Q

what values are used to assess leverage

Answer

A

hat values

Question 7

Q

high influence cases

Answer

A

when a case has high leverage and is an outlier - this has a large influence on the estimation of regression models
can have a strong effect on B coefficients - so if we deleted it they would change
-> The degree of change is a way to judge the magnitude of influence

Question 8

Q

what does cooks distance use for considering influence

Answer

A

combines leverage (hat values) with the outlying-ness to capture the influence
Di = outlying-ness * leverage)

Question 9

Q

cook’s distance refers to…

Answer

A

the average distance the y^ values will move if a given case is removed
- if removing the case changes the predicted values a lot (moves the regression line), then that case is influencing our results
a single value which summarises the total influence of a case

Question 10

Q

DFFit

Answer

A

difference between the predicted outcome value for a case with and without a case included

Question 11

Q

DFbeta

Answer

A

difference between the value for a coefficient with and without a case included

Question 12

Q

DFbetas

Answer

A

a standardised version of DFbeta
obtained by dividing by an estimate of the standard error of the regression coefficients with the case removed

Question 13

Q

which diagnostics are used to look at linear models with 2+ predictors in more detail

Answer

A

DFFit, DFbeta, DFBetas

Question 14

Q

what measures the influence of standard errors

Question 15

Q

COVRATIO

Answer

A

measure the effect of an observation on the covariance matrix of the parameter estimates
- an observation’s influence on standard error

Question 16

Q

COVRATIO values and meanings

Answer

Study These Flashcards

A

more than 1 = precision decreased, standard error increased by a case
less than 1 = precision increased, standard error decreased by a case

Question 17

Q

what to do if identify an unusual case

Answer

Study These Flashcards

A

try to find out why it is unsual as it is not a good idea to delete if it is not showing large influnce

Question 18

Q

what to do if you find an error in data entry

Answer

Study These Flashcards

A

an error could be a value outwith the plausible range
delete if it cannot be corrected

Question 19

Q

what to do if the data is extreme but still legit

Answer

Study These Flashcards

A

consider ways to reduce influence before deleting like windsorising
this could be due to model specification problems

Question 20

Q

sensitivity analyses

Answer

Study These Flashcards

A

used to check if results are similar irrespective of methodological decisions made and check if result patterns are similar

Question 21

Q

what does it mean if results in sensitivity analyses are similar

Answer

Study These Flashcards

A

results are not dependent on the methodological decisions
if they differ a lot, the results are a limitation of your research :(

case diagnostics Flashcards

(21 cards)