case diagnostics Flashcards
what do regression/model outliers have?
a large residual E^i
E^i
discrepancy between predicted y value (y^i) and observed value (yi)
how to calculate standardised residuals
divide E^i by the estimate of the standard deviation of residuals, and convert the residuals to z-score units (this calculation includes the potential outlier)
computed by rstandard() function in r
how to calculate studentised residuals
divide E^i by the estimate of the standard deviation of the residuals excluding the case i
provides a version of standardised residuals excluding the outlier case
computed by rstudent() function in r
high leverage cases
cases with an unusual value of predictor (xi) or a combination of predictor values
have the potential to influence the B^0 (intercept) or B^1 (slope) of the regression model
can increase x variance
what values are used to assess leverage
hat values
high influence cases
when a case has high leverage and is an outlier - this has a large influence on the estimation of regression models
can have a strong effect on B coefficients - so if we deleted it they would change
-> The degree of change is a way to judge the magnitude of influence
what does cooks distance use for considering influence
combines leverage (hat values) with the outlying-ness to capture the influence
Di = outlying-ness * leverage)
cook’s distance refers to…
the average distance the y^ values will move if a given case is removed
- if removing the case changes the predicted values a lot (moves the regression line), then that case is influencing our results
a single value which summarises the total influence of a case
DFFit
difference between the predicted outcome value for a case with and without a case included
DFbeta
difference between the value for a coefficient with and without a case included
DFbetas
a standardised version of DFbeta
obtained by dividing by an estimate of the standard error of the regression coefficients with the case removed
which diagnostics are used to look at linear models with 2+ predictors in more detail
DFFit, DFbeta, DFBetas
what measures the influence of standard errors
COVRATIO
COVRATIO
measure the effect of an observation on the covariance matrix of the parameter estimates
- an observation’s influence on standard error
COVRATIO values and meanings
more than 1 = precision decreased, standard error increased by a case
less than 1 = precision increased, standard error decreased by a case
what to do if identify an unusual case
try to find out why it is unsual as it is not a good idea to delete if it is not showing large influnce
what to do if you find an error in data entry
an error could be a value outwith the plausible range
delete if it cannot be corrected
what to do if the data is extreme but still legit
consider ways to reduce influence before deleting like windsorising
this could be due to model specification problems
sensitivity analyses
used to check if results are similar irrespective of methodological decisions made and check if result patterns are similar
what does it mean if results in sensitivity analyses are similar
results are not dependent on the methodological decisions
if they differ a lot, the results are a limitation of your research :(