Chapter 18 - GLM Flashcards
Explanatory variables
- inputs into the model that are expected to influence the response variable
- choice of explanatory variables depends on the purpose of the model
Response variables
- outputs from the model that are likely to be affected by explanatory variables
Categorical variables
- explanatory variables
- aka factors
- values of each level or distinct
- often cannot be given natural ordering or score
- continuous numerical variables (e.g. age) are often categorical
Non-categorical variables
- can take numerical values
Interaction terms
- included where pattern of response variable is better modelled by including parameters for each combination of two or more factors
What does a GLM do?
A GLM unpicks relationships and produces estimates of the true values of the relativities. It does this by taking account of correlations and allowing for investigation of any interactions between variables in the model
Assumptions of classic linear model
- the error terms are independent and come from a normal distribution
- the mean is a linear combination of the explanatory variables
- the error terms have constant variance
Can estimate the parameters B0, B1, B2 using method of maximum likelihood
pg.635
Drawbacks of the normal model for multiple linear regression
- assumes that the response variable has a normal distribution which may not be appropriate for the variable being modelled
- the normal distribution has a constant variance which may not be appropriate for the variable being modelled
- adds together the effects of different explanatory variables, but this is seldom what is observed in practice
- with more than two explanatory variables, a manual solution becomes increasingly long-winded
How do GLMs address these problems?
- the response variable can take any distribution from the exponential family
- a link function is introduced which acts to remove the assumption that the effects of different variables must simply be added together
- allow an offset term to be included within the linear predictor
GLM form
Pg. 639
Properties of members of the exponential family
- the distribution is completely specified in terms of its mean and variance
- the variance of Yi is a function of its mean
Requirements for link function
- differentiable
- monotonic
Obtaining the predicted values from a simple GLM
- specify the design matrix X and the vector of parameters B
- choose the distribution for the response variable and the link function
- identify log-likelihood function
- take the log to convert product into sum
- maximise the log of the likelihood function by taking partial derivatives with respect to each parameter, setting them to zero and solving the result of the system of equations
- compute the predicted values
Degrees of freedom
number of observations less the number of parameters