case diagnostics Flashcards

1
Q

what do regression/model outliers have?

A

a large residual E^i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

E^i

A

discrepancy between predicted y value (y^i) and observed value (yi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how to calculate standardised residuals

A

divide E^i by the estimate of the standard deviation of residuals, and convert the residuals to z-score units (this calculation includes the potential outlier)
computed by rstandard() function in r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how to calculate studentised residuals

A

divide E^i by the estimate of the standard deviation of the residuals excluding the case i
provides a version of standardised residuals excluding the outlier case
computed by rstudent() function in r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

high leverage cases

A

cases with an unusual value of predictor (xi) or a combination of predictor values
have the potential to influence the B^0 (intercept) or B^1 (slope) of the regression model
can increase x variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what values are used to assess leverage

A

hat values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

high influence cases

A

when a case has high leverage and is an outlier - this has a large influence on the estimation of regression models
can have a strong effect on B coefficients - so if we deleted it they would change
-> The degree of change is a way to judge the magnitude of influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does cooks distance use for considering influence

A

combines leverage (hat values) with the outlying-ness to capture the influence
Di = outlying-ness * leverage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

cook’s distance refers to…

A

the average distance the y^ values will move if a given case is removed
- if removing the case changes the predicted values a lot (moves the regression line), then that case is influencing our results
a single value which summarises the total influence of a case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DFFit

A

difference between the predicted outcome value for a case with and without a case included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DFbeta

A

difference between the value for a coefficient with and without a case included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DFbetas

A

a standardised version of DFbeta
obtained by dividing by an estimate of the standard error of the regression coefficients with the case removed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

which diagnostics are used to look at linear models with 2+ predictors in more detail

A

DFFit, DFbeta, DFBetas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what measures the influence of standard errors

A

COVRATIO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

COVRATIO

A

measure the effect of an observation on the covariance matrix of the parameter estimates
- an observation’s influence on standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

COVRATIO values and meanings

A

more than 1 = precision decreased, standard error increased by a case
less than 1 = precision increased, standard error decreased by a case

17
Q

what to do if identify an unusual case

A

try to find out why it is unsual as it is not a good idea to delete if it is not showing large influnce

18
Q

what to do if you find an error in data entry

A

an error could be a value outwith the plausible range
delete if it cannot be corrected

19
Q

what to do if the data is extreme but still legit

A

consider ways to reduce influence before deleting like windsorising
this could be due to model specification problems

20
Q

sensitivity analyses

A

used to check if results are similar irrespective of methodological decisions made and check if result patterns are similar

21
Q

what does it mean if results in sensitivity analyses are similar

A

results are not dependent on the methodological decisions
if they differ a lot, the results are a limitation of your research :(