Chapter 18 Generalised Linear Modelling Flashcards
Describe the principal modelling techniques appropriate to health and care insurance:
1
Q
Uses of GLMs
A
- A GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several other characteristics eg age, sex, chronic condition.
- It is a generalisation of the normal model for multiple linear regression.
2
Q
What are the two properties of any member of the exponential family?
A
- the distribution is completely specified in terms of its mean and variance.
- the variance is a function of its mean
3
Q
What is the link function?
A
- the link function acts to remove the assumption that the effects of different variables must simply be added together.
- it must be both differentiable and monotonic.
- include:log, logit & identity functions.
4
Q
Steps for obtaining predicted values from a single GLM
A
- Specify design matrix X and the vector of parameters Beta
- Choose a distribution for the response variable and the link function.
- Identify the log-likelihood function
- Take logarithm to convert the product of many terms into a sum
- Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter.
- Compute predicted values.
5
Q
Measuring uncertainty in the estimators of the model parameters
A
- The cramer-rao lower bound is used.
- the maximum likelihood estimator theta-hat is distributed N(theta,CRLB).
- standard errors in a GLM will be found using the Hessian matrix.
- this is a matrix of 2nd derivatives.
6
Q
What other ways can be used to test significance?
A
- Comparisons with time
- Consistency checks with other factors
7
Q
Comparisons with time
A
8
Q
Consistency checks with other factors
A
- time is not the only factor that can be used as a consistency check.
- eg an explanatory variable like age would be expected to show the same pattern regardless of geopraphical region.
9
Q
Testing the appropriateness of models
A
- The hat matrix is one of the outputs of the model-fitting process.
- It is the matrix H such that y-Hat = Hy
- For Normal multiple linear regression model.
- The diagonal entries, h(i,i) of the matrix are called leverages. h(i,i) in interval (0,1).
- Leverages measure the influence that each observed value has on the fitted value for that observation.
- Data points with high leverages or residuals may distort the outcome and accuracy of a model.
10
Q
Models may be refined using (4)?
A
- Interactions
- Aliasing
- Resctrictions
- Smoothing
11
Q
Model refinement: Interactions
A
- After choosing a structure of the model and checked that it is appropriate for the factors chosen the model can be refined further.
- The may be complete or marginal.
12
Q
Extrinsic aliasing
A
-Occurs when two or more factors contain levels that are perfectly correlated.
13
Q
Near aliasing
A
- When modelling in practice, a common problem occurs when two or more factors contain levels that are almost, but not quite, perfectly correlated.
- In order to understand problems where model suggest very large negative/positive parameters 2 way tables of exposure and claim counts.
- From this it should be possible to identify combinations that cause near aliasing.
- The issue can then be resolved either deleting or excluding those rogue records or reclassifying the rogue records into another, more appropriate, factor.
14
Q
Parameter smoothing
A
- A GLM can improved by smoothing the parameter values. This can be achieved by grouping level factors.
- The granularity of data can be kept in modeling since softwares can use this granularity & patterns to better group the different levels of variables.
15
Q
Factors can be simplifies by?
A
- Grouping and summarising data prior to loading. Requires knowledge of expected patterns.
- Grouping in the modelling package. eg grouping age into two age bands.