Chapter 18 - GLM Flashcards
What are explanatory variables
They are inputs into a model that are expected to influence the response variable.
What are response variables
They are outputs from the model that are likely to be affected by the explanatory variables
What are categorical values and non-categorical values
They are explanatory variables that are used for modelling where the values of each level are distinct, and often cannot be given any natural ordering or score. Eg. gender which takes on the value male or female.
Non-categorical values can take numerical values, eg age
What are the drawbacks of using a normal model for linear regression
CLAN
- The normal distribution has a CONSTANT variance, which may not be appropriate for the variable being modelled
- More than 2 explanatory variables makes the time LONG to compute
- The normal model ADDS together the effects of the different explanatory variables, but this is seldom what is observed in practice
- It assumes that the response variable, Y, has a NORMAL distribution , which may not be appropriate for the variable being modelled
How does GLMs address the problems for the normal model for linear regression
- The response variable can take any distribution from the exponential family - it no longer has to take the normal distribution
- A link function is introduced - this acts to remove the assumption that the effects of different variables must simply be added together
What is the purpose of the link function
- The link function acts to remove the assumption that the effects of different variables must simply be added together
What does the deviance of a model compare
It compares the observed value Yi to the fitted value Ui
In essence, the deviance is a measure of how much the fitted values differ from the obervations
What do the chi-squared statistic measure
This measures whether the inclusion of one or more additional explanatory variables in a model improves the fit significantly
What can be used to measure the uncertainty in the parameter estimators used in GLM
The Cramer-rao lower bound
What are deviance residuals
Is a measure of the distance between the actual observation and the fitted values
What are standardised pearson residuals
It is the difference between the observed response and the predicted value,
adjusted for the standard deviation of the predicted value
and the leverage of the observed response
What is Cook’s distance used for
It is used to estimate the influence of a data point on the model results
Cook’s distance of 1 or more is considered to merit closer examination in the analysis
What is Aliasing
Aliasing occurs when there is dependency among the observed covariates. i.e one covariate may be identical to some linear combination of other covariates
What is intrinsic aliasing
This occurs because of dependencies inherent in the definition of the covariates.
These intrinsic dependencies arise most commonly whenever categorical variables are included in the model.
What is Extrinsic aliasing
It arises from a dependency among the covariates. It arises when the dependency results from the nature of the data itself, rather than as a result of the inherent properties of the covariates.