Lecture 5- Bias Flashcards
Methods of detecting outliers and influential cases
- Graphs
- Standardised residual
- Cook’s distance
- DF beta statistics (unstandardised and standardised)
How to detect outliers and influential cases using standardised residual
- 95% of standardised residuals SHOULD lie between +/- 2
- 99% of standardised residuals SHOULD lie between +/- 2.5
- If the absolute value of standardised residuals is +/- 3, is likely to be an outlier
How to detect outliers and influential cases using Cook’s distance
- Measure the influence of a single case on the model as a whole
- Absolute values greater than 1 may be cause for concern
How to detect outliers and influential cases using DF beta statistics
- The change in b when a case is removed
- Be wary of standardised values with absolute values > 1
The population model should have
- Homoscedastic errors
- Independent errors
Key assumptions of the general linear model
- Linearity and additivity
- Spherical errors
- Normality of something or other
A models errors refer to the
Differences between predicted values and observed values of the outcome variable in the population model
These values cannot be observed
A model’s residuals refer to the
Differences between predicted values and observed values of the outcome variable in the sample model
These values can be observed and are representative of the population model errors
The population error in prediction for one case should
Not be related to the error in prediction for another case (autocorrelation)
Because population errors can’t be observed
Sample residuals are inspected
Variance of population errors (residuals) should be
Consistent at different values of the predictor variable
Violation of the assumption in spherical errors
- bs are unbiased but not optimal
- Standard error is incorrect, therefore t-tests, p-values and confidence intervals will also be incorrect
Do normality of model errors matter
Not really
When errors are normally distributed b will be
Unbiased and optimal but there may be classes of estimator that are more accurate
p-values associated with the bs of the model assume that
The test statistic associated with them follows a normal distribution