GLM 3- Assumptions Flashcards
What is the first assumption of GLM?
Response-predictor Linearity
How can the first assumption of GLM be diagnosed?
Residual plots allow to identify non-linearity.
They plot the fitted values, y^i’s, against the residuals,e^i s.
The plot should indicate a linear trend –i.e. a line– between the fitted values and the residuals.
How can the first assumption of GLM be remedied?
If the residual plot indicates that there is a non-linear relationship, one can either:
Transform the predictors, log X, square root of X
Use polynomial regression by including X2, X3, for instance
What is assumption 2 of GLM?
Constant Variance of Errors
How can the second assumption of GLM be diagnosed?
Residual plots can, again, enable us to assess whether the variances of the error terms are constant.
The error terms are assumed to be homoscedastic. That is, to have identical variance for different levels of the fitted values, y^i ’s.
Consequently, we should observe a uniform distribution of the variation of the residuals across the levels of predicted values.
How can the second assumption of GLM be remedied?
We can transform the response, Y , by taking log(Y ), or square root of Y. If such transformations do not work or are impossible, just report it in your analysis.
We can exploit the source of variability in the responses, if known. The yi ’s may be aggregates with associated variances, σi . In such cases, we can use weighted least squares (WLS).
What is the third assumption of GLM?
Non-correlation of Errors
How can the third assumption of GLM be diagnosed?
Serial Residual plots allow to identify the correlation of the errors.
They plot the residuals, e’s with respect to the observations IDs.
The plot should not indicate long-term dependency between sequences of residuals. This would violate the assumption of independent observations.
How can the third assumption of GLM be remedied?
Typically, non-independence may be present due to some structure in your data, due to groups or time.
Model the group structure in your data, using a mixed-effects model, or hierarchical model.
Model the time-lag structure in your data, again using a mixed-effects model, or hierarchical model.
What is assumption 4 of GLM?
Detecting Outliers
How can assumption 4 of GLM be diagnosed?
An outlier is a point, which is far from the values predicted by the model.
Outliers will result in an increase in the Residual Sum of Squares (RSS),
used to compute R2, and the confidence intervals for each parameter.
The studentized residual plot show the values of the residuals, ei ’s, divided by their standard errors
How can assumption 4 be remedied if violated?
Typically, the studentized residuals should not exceed 3 standard errors.
If a data point has a residual with a studentized residual of 3 or more, you may consider removing it, especially if you suspect that this observation is faulty in some ways.
However, care should be taken as the presence of an outlier may also indicate a deficiency in your model.
What is assumption 5 of GLM?
High-leverage Points
How can assumption 5 of GLM be diagnosed?
A high-leverage point is a data point, whose removal produces a substantially different set of parameters.
The leverage, or hat-value, is a quantity that measures how unusual is that data point with respect to all the others.
We can plot the individual leverages against the values of the studentized residuals to identity points that are outliers and have also high leverage.
How can assumption 5 of GLM be remedied?
There is no specific rule of thumb for detecting high leverage.
As for outliers however, you may consider removing it, especially if you suspect that this observation is faulty in some ways.
But, again, care should be taken as the presence of an outlier may also indicate a deficiency in your model.