Chapter 18 Generalised Linear Modelling Flashcards

Describe the principal modelling techniques appropriate to health and care insurance:

1
Q

Uses of GLMs

A
  • A GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several other characteristics eg age, sex, chronic condition.
  • It is a generalisation of the normal model for multiple linear regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two properties of any member of the exponential family?

A
  • the distribution is completely specified in terms of its mean and variance.
  • the variance is a function of its mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the link function?

A
  • the link function acts to remove the assumption that the effects of different variables must simply be added together.
  • it must be both differentiable and monotonic.
  • include:log, logit & identity functions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Steps for obtaining predicted values from a single GLM

A
  • Specify design matrix X and the vector of parameters Beta
  • Choose a distribution for the response variable and the link function.
  • Identify the log-likelihood function
  • Take logarithm to convert the product of many terms into a sum
  • Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter.
  • Compute predicted values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Measuring uncertainty in the estimators of the model parameters

A
  • The cramer-rao lower bound is used.
  • the maximum likelihood estimator theta-hat is distributed N(theta,CRLB).
  • standard errors in a GLM will be found using the Hessian matrix.
  • this is a matrix of 2nd derivatives.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What other ways can be used to test significance?

A
  • Comparisons with time

- Consistency checks with other factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Comparisons with time

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Consistency checks with other factors

A
  • time is not the only factor that can be used as a consistency check.
  • eg an explanatory variable like age would be expected to show the same pattern regardless of geopraphical region.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Testing the appropriateness of models

A
  • The hat matrix is one of the outputs of the model-fitting process.
  • It is the matrix H such that y-Hat = Hy
  • For Normal multiple linear regression model.
  • The diagonal entries, h(i,i) of the matrix are called leverages. h(i,i) in interval (0,1).
  • Leverages measure the influence that each observed value has on the fitted value for that observation.
  • Data points with high leverages or residuals may distort the outcome and accuracy of a model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Models may be refined using (4)?

A
  • Interactions
  • Aliasing
  • Resctrictions
  • Smoothing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model refinement: Interactions

A
  • After choosing a structure of the model and checked that it is appropriate for the factors chosen the model can be refined further.
  • The may be complete or marginal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Extrinsic aliasing

A

-Occurs when two or more factors contain levels that are perfectly correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Near aliasing

A
  • When modelling in practice, a common problem occurs when two or more factors contain levels that are almost, but not quite, perfectly correlated.
  • In order to understand problems where model suggest very large negative/positive parameters 2 way tables of exposure and claim counts.
  • From this it should be possible to identify combinations that cause near aliasing.
  • The issue can then be resolved either deleting or excluding those rogue records or reclassifying the rogue records into another, more appropriate, factor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parameter smoothing

A
  • A GLM can improved by smoothing the parameter values. This can be achieved by grouping level factors.
  • The granularity of data can be kept in modeling since softwares can use this granularity & patterns to better group the different levels of variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Factors can be simplifies by?

A
  • Grouping and summarising data prior to loading. Requires knowledge of expected patterns.
  • Grouping in the modelling package. eg grouping age into two age bands.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Restrictions when pricing: GLM restrictions

A
  • Legal or commercial considerations may impose rigid restrictions on the way particular factors are used in practice.
  • eg legal: restricting use of age and gender in pricing for medical scheme.
  • When the use of certain factors is restricted the model will be able to compensate for this to an extent for this artificial restriction by adjusting the fitted relativities for correlated factors.
  • this is achieved by the offset term.
17
Q

Restrictions when pricing: Further restrictions

A
  • theoretical risk premium results from a GLM claims analysis will differ from the rates implemented in practice since consideration needs to be given to price demand elasticity and competitive situation.
  • price elasticity demand is a measure of how the demand changes to a change in price.
  • The competitive situation is relevant because there may be pockets of business where theoretical risk premium rates would produce a market premium that are much higher or lower than those of your competitors.
  • eg if your premiums are much lower than those of you competitors then there may be an opportunity to increase your rates and still be cheapest in the market, thereby increase profits without decreasing volumes.
18
Q

What are the drawbacks for the normal model for multiple linear regression?

A
  • it assumes the response variable has a normal distribution
  • the normal distribution has a constant variance which may not be appropriate
  • it adds together the effects of different explanatory variables, but is often not what is observed
  • it becomes long-winded with more than two explanatory variables.