Chapter 18 Generalised Linear Modelling Flashcards
Describe the principal modelling techniques appropriate to health and care insurance:
What is an explanatory variable?
- input into a model that is expected to influence the response variable.
- i.e. rating factor
-it is important that explanatory variables make intuitive sense.
What is a response variable?
- output variable from a model is likely to be influenced by an explanatory variable.
- ie price
What is a categorical variable?
- These are explanatory variables which are discrete and distinct, often cannot be given any natural ordering score.
- Eg gender
What is non-categorical variable?
-can take numerical values eg age.
What is an interaction term?
-Used where the pattern in response variable is better modelled by including an extra parameter for each combination of two or more factors.
One-way analysis merits
- prior to use of GLMs the effect of frequency and severity of each rating factor was considered separately.
- This one-way analysis ignores correlations and interaction effects between variables and so may underestimate or double count the effects of variables.
Uses of GLMs
- A GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several other characteristics eg age, sex, chronic condition.
- It is a generalisation of the normal model for multiple linear regression.
What are the drawbacks for the normal model for multiple linear regression?
- it assumes the response variable has a normal distribution
- the normal distribution has a constant variance which may not be appropriate
- it adds together the effects of different explanatory variables, but is often not what is observed
- it becomes long-winded with more than two explanatory variables.
Assumptions of classical linear models
- error term are independent and come from a normal distribution
- the mean is a linear combination of the explanatory variables
- the error terms have constant variance (or homoscedasticity)
What are the two properties of any member of the exponential family?
- the distribution is completely specified in terms of its mean and variance.
- the variance is a function of its mean
What is the link function?
- the link function acts to remove the assumption that the effects of different variables must simply be added together.
- it must be both differentiable and monotonic.
- include:log, logit & identity functions.
Steps for obtaining predicted values from a single GLM
- Specify design matrix X and the vector of parameters Beta
- Choose a distribution for the response variable and the link function.
- Identify the log-likelihood function
- Take logarithm to convert the product of many terms into a sum
- Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter.
- Compute predicted values.
What techniques are used to analyse significance of explanatory variables?
- chi-squared test
- the F-statistic - models need to be nested for this to work.
- Akaike Criterion Information - appropriate where models are not nested.
- other methods
Define degrees of freedom
-number of observations - number of parameters
AIC formula
AIC = -2 * log likelihood + 2* number of parameters
- the lower the AIC the better the model.
- fewer parameters is better/parsimonious model.