CH 18 (WM) Flashcards

1
Q

List the assumptions of classical linear models. [1.75]

A
  • error terms are independent and come from a normal distribution ✓✓
  • the error terms have constant variance✓✓ (or homoscedasticity) ✓
  • the mean is a linear combination of the explanatory variables ✓✓
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the drawbacks for the normal model for multiple linear regression? [2]

A
  • it assumes the response variable has a normal distribution ✓✓
  • the normal distribution has a constant variance which may not be appropriate ✓✓
  • it adds together the effects of different explanatory variables, but this is often not what is observed ✓✓
  • it becomes long-winded with more than two explanatory variables ✓✓
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the term “explanatory variables”. [1.5]

A

Explanatory variables are inputs into a model that are expected to influence the response variable.✓✓

In a pricing context, the explanatory variables would be rating factors.✓✓

It is important that explanatory variables make intuitive sense.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the term “response variables”. [1]

A

Response variables are outputs from the model that are likely to be affected by the explanatory variables.✓✓

In an overall pricing context, the response variable would be the price.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the terms “categorical and non-categorical variables”, together with examples of each. [3]

A

Categorical variables are explanatory variables that are used for modelling where the values of each level are distinct✓✓, and often cannot be given any natural ordering or score✓✓. An example of this would be gender, which can take the value of male or female✓✓.

By contrast, non-categorical variables can take numerical values, eg age.✓✓

Categorical variables are sometimes referred to as factors.✓✓ The majority of explanatory variables used in practice within GLMs for insurance are factors.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are meant by the levels of a categorical variable? [1]

A

The levels of a categorical variable are simply the distinct values that the variable can take.✓✓
So, if gender is a variable in a GLM and it can only take the values “male” or “female”, then gender would be said to have two levels.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain how continuous numerical variables like age can be treated. [1.5]

A

Often, continuous numerical variables like age can be treated as categorical variables.✓✓
For example, if the “age of policyholder” variable was grouped into age bands (of 5 years for example), the new variable “age band” would be a categorical variable✓✓. This is because each such band is effectively a discrete category, ie a level of a categorical variable✓✓.

“Categorical variable” appears in every sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the various techniques used to analyse the significance of the explanatory variables used in a model. [1]

A
  • The chi-squared test
  • The F statistic
  • the Akaike Information Criteria (“AIC”)
  • Other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain what is meant by a nested model. [2.25]

A

Two models are nested if one model contains explanatory variables that are a subset of the explanatory variables in the other model.✓✓

For example, if Model 1 has linear predictor: a + bx ✓✓
and Model 2 has linear predictor: a + bx + cx^2 ✓✓

then Model 1 is a subset of Model 2 ✓✓, ie Models 1 and 2 are nested.✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe how you will apply the Chi-squared statistic to analyse the significance of the explanatory variables used in a model. [2]

A

If Models 1 and 2 are nested, then the change in scaled deviance follows a chi-squared distribution, ✓✓ ie:

Formula = { } ✓✓✓✓

This measures whether the inclusion of one or more additional explanatory variables in a model improves the model fit significantly.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suppose that Model A and Model B are nested models with 6 and 10 parameters respectively.
The scaled deviance of Model A is 17.80 and for Model B is 11.08. Explain whether Model B is a significant improvement on Model A. [2]

(Question 18.10)

A

The difference in the scaled deviance is 6.72. ✓✓

The difference in the number of degrees of freedom is the same as the difference in the numbers of parameters in the models, ie 4. ✓✓

Since 6.72 < 9.488 , the upper 5% point of the chi-squared statistic ✓✓
there is insufficient evidence at the 5% significance level to reject Model A in favour of Model B. ✓✓

(page 168 of the Tables.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe how you will apply “F statistics” to analyse the significance of the explanatory variables used in a model. [3.25]

A

In cases where the scale parameter for the model is unknown, for example when using the gamma distribution, it has to be estimated.✓✓

The estimate of the scale parameter is distributed as a chi-square distribution.✓✓

The ratio of the change in the deviance and the scale parameter estimate is distributed with an F distribution ✓✓, since the F distribution is the ratio of chi-square distributions ✓:

Formula = { } ✓✓✓✓

Note that the models need to be nested for this result to be valid.✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Suppose Model C and Model D are nested models with 8 and 16 parameters respectively, and have been fitted to a set of 50 observations. The deviance for Model C is 40.89 and the deviance for Model D is 26.40. The scale parameter is unknown.
Explain whether Model D is a significant improvement on Model C. [3.25]

Question 18.11

A

The difference in the deviance is 14.49. ✓✓

The difference in the number of degrees of freedom is (50 – 8) – (50 – 16) = 8. ✓✓

The number of degrees of freedom in Model D is 34. ✓✓

So the value of the test statistic is: 2.33. ✓✓

From page 172 of the Tables, the upper 5% point of F(8,34) is 2.225. ✓✓

Since our test statistic exceeds this value✓✓, we reject Model C in favour of Model D. ✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe how you will apply the Akaike Information Criteria (AIC) to analyse the significance of the explanatory variables used in a model. [3]

A

In cases where models are not nested, the AIC can be used to compare them.✓✓

The AIC for a model is calculated as: -2x log-likelihood + 2x number of parameters.✓✓

The AIC looks at the trade-off of the likelihood of a model against the number of parameters✓✓: the lower the AIC, the better the model.✓

For example, if two models fit the data equally well in terms of the log-likelihood ✓✓, then the model with fewer parameters is the more parsimonious✓✓, ie simpler, (and therefore “better”).✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define the term “generalised linear model (“GLM”). [2.75]

A

A generalised linear model (GLM) is a flexible generalisation of linear regression.✓✓

Generalised linear models are used to assess and quantify the relationship between a response variable and a set of possible explanatory variables.✓✓

For example, a GLM can be used to model the behaviour of a random variable✓ that is believed to depend on the values of several characteristics, eg age, gender and chronic condition✓✓.

These kinds of models can be used in a number of applications for private medical insurance✓ including risk modelling, pricing, financial projections and overall modelling of the business.✓✓✓✓

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Q&A 3.10 [5]

3.841 & 53.38

A
17
Q

Q&A 3.11 [3]

2.077

A
18
Q

Definition the term “interaction”. [1]

A

An interaction exists when the effect of one factor varies, depending on the levels of another factor.
[1⁄2]
Interactions would be used where the pattern in the response variable (eg frequency or severity) is better modelled by including extra parameters for each combination of two of more factors.
[1⁄2]

19
Q

Provide an example of the effect of interaction terms. [2]

A

Old individuals may have an %x higher risk than young individuals ✓✓ and individuals with chronic conditions may have a y%
higher risk than individuals without chronic conditions✓✓.

However, the combination of being older with a chronic condition may result in a much higher risk than [(1+x/100) x (1+y/100)-1]x100%. ✓✓

In this case, the effect of age depends on chronic conditions and the effect of a chronic condition depends on age.✓✓

20
Q

Define the term “one-way analysis”. [2.5]

A

Prior to the use of GLMs in pricing✓, it was common to look at the effect on frequency and severity of each rating factor separately.✓✓ This is known as one-way analysis✓.

A one-way analysis ignores correlations and interaction effects between variables✓✓, for example age and disease, age and family size, or maternity and gender✓✓. As a result, the model may underestimate or double count the effects of variables.✓✓

21
Q

You work for a health insurance company and specialise in generalised linear models (GLMs). A colleague in the pricing department has overheard you talking about residuals and is interested to learn a bit more about them.

Describe the following measures that can be used to check that a GLM is appropriate for the data given. You are not required to produce mathematical formulae.

(a) deviance residuals [2.5]

(Q&A 3.21)

A

Deviance residuals
A deviance residual, for a given observation, is a measure of the difference between the observed value and the value fitted by the model. [1⁄2]

The deviance residual considers the square root of each observation’s contribution to the deviance, …[1⁄2]

… adjusted for the direction in which the raw residual (the difference between the observed value and the fitted value) acts. [1⁄2]

The deviance measure corrects for the skewness of the distributions used, … [1⁄2]
… which means that the deviance residuals would be expected to be more closely normally distributed than the raw residuals. [1⁄2]

22
Q

You work for a health insurance company and specialise in generalised linear models (GLMs). A colleague in the pricing department has overheard you talking about residuals and is interested to learn a bit more about them.

Describe the following measures that can be used to check that a GLM is appropriate for the data given. You are not required to produce mathematical formulae.

(a) Pearson residuals [1.5]

(Q&A 3.21)

A

Pearson residuals

A Pearson residual, for an individual observation, is the difference between the observed value and the fitted value (ie the raw residual), …[1⁄2]

… adjusted for the standard deviation of the predicted value and the leverage of the observed response. [1⁄2]

This measure does not adjust for the shape of the distribution. [1⁄2]

23
Q

You have plotted the deviance residuals from your model, to check that the distribution chosen for the response variable is appropriate.
(ii) Explain how you will determine from the residual plot whether or not your model is likely to be a good fit. [3]

A

**Residual plot **
The residual plot could be a scatter plot of deviance residuals against the fitted values. [1⁄2]

If the distribution is appropriate for the data that are being modelled, the residual plot will have the following characteristics:

  • the pattern of residuals will be symmetrical about the x-axis [1⁄2]
  • the average residual will be zero, … [1⁄2]
    … so there should be an equal number of points above zero and below zero on the graph [1⁄2]
  • the range of residual values will be fairly constant across the width (the x-axis) of the fitted values. [1⁄2]

A residual plot where the range of residuals narrows or widens as the fitted value increases, or where the range of residuals is not symmetrical about the x-axis, indicates that the model specification is poor. [1]

[Maximum 3]

24
Q

What should we do if the residual checks suggest that our model is not a good fit to the data? [1]

Question 18.12

A

Solution 18.12

The model should be re-specified✓ by choosing a different statistical distribution✓ or a different linear predictor✓, link function✓ etc.

25
Q

Define the term “Aliasing”. [1]

A

Aliasing occurs when there is a linear dependency among the observed covariates X1, X2,…,Xp.
✓✓
There are two types of aliasing: intrinsic aliasing and extrinsic aliasing.✓✓

26
Q

Define the term “Intrinsic Aliasing” and provide an example of when it occurs. [2.5]

A

Intrinsic aliasing occurs because of dependencies inherent in the definition of the covariates.✓✓

This is dealt with by modelling software.✓✓

These intrinsic dependencies arise most commonly whenever categorical variables are included in the model.✓✓

For example, consider “patient age”, which has the four levels: 0-20 years, 21-40 years, 41-60 years and 60+ years.✓✓

Clearly, if any of X1, X2, or X3 is equal to 1, then X4 is equal to zero; and if X4 is equal to 1, then the rest are all equal to zero.✓✓

In particular:
X4 = 1 - X1 - X2 - X3.✓✓

27
Q

Define the term “Extrinsic Aliasing”. [1.25]

A

As with intrinsic aliasing, extrinsic aliasing also arises from a dependency among the covariates.✓ However, it arises when the dependency results from the nature of the data itself, rather than as a result of inherent properties of the covariates.✓✓

Extrinsic aliasing occurs when two or more factors contain levels that are perfectly correlated.✓✓

28
Q

Define the term “Near Aliasing”. [0.5]

A

Near aliasing occurs when the correlation is almost, but not quite, perfect.✓✓

29
Q

Define, using formulae where necessary, the total deviance and the scaled deviance in the context of a generalised linear model.
[5]

A

Q&A 3.20 (i)

30
Q

For a particular multiplicative GLM, you believe there is an interaction between two factors, each with three levels. Factor 1 can take values A, B and C, while Factor 2 can take values X, Y and Z.
(ii) Explain, using a numerical example, the difference between a complete interaction and a marginal interaction for these two factors.
[6]

A

Complete and marginal interactions are alternative representations of the same thing. [1⁄2]
A complete interaction is expressed as a single factor that represents every combination of the factors involved.
[1⁄2]
For example, for the two factors given, there would be a new single factor, representing the interaction, which would have nine levels, …
[1⁄2]
… ie AX, AY, AZ, BX, BY, BZ, CX, CY, CZ. [1⁄2]
Each of these levels would have a multiplier attached (since this is a multiplicative model). These could be written in the form of either a one-way or a two-way table. [1⁄2]
For example: Factor 1:
Factor 2:
Y Z
A B C
X 0.90 0.97 1.26
1.00 1.10 1.20 1.40
1.45 1.85
[1⁄2]
In this case, the base level has been selected to be the level corresponding to Level B of Factor 1 and Level X of Factor 2, …
[1⁄2] … and the interaction term has 8 parameters. [1⁄2]
A marginal interaction considers the additional effect of the interaction term over and above the single factor effects.
[1⁄2]
In this case, the single factor effects will be observed separately from the marginal interaction term effects.
Using the same example as above, the multipliers would look as follows: Factor 1:
A B C 0.90
- Factor 2: X -
Y 1.20 1.40
Z -
0.90 1.00
- - -
-
1.10 1.20
[2]
So the overall relativity for Factor 1 Level A and Factor 2 Level Y would be 0 … .
90 1 20 0 90 0 97 
interactions. This marginal interaction would therefore be calculated as
1.20 0.90
0.97 
 0.90. [1⁄2]
[Maximum 6]

31
Q

Explain why you would want to check consistency of the model over time and describe how you would use interactions to do this.
[3]

A

iii) Consistency over time
When pricing, it is important to check that the patterns of relativities observed in a GLM are not changing too much over time. [1⁄2]
If a trend emerges over time then it is important to identify it, so that the patterns can be projected to the period over which the rates will apply.
[1⁄2]
The time consistency check is also used to determine whether the effect of each factor is consistent from year to year. [1⁄2]
If a factor is consistent then it is likely to be a good predictor of future experience. [1⁄2]
To test the consistency of parameter estimates over time, a GLM can be fitted that includes the interaction of a single factor with a measure of time, …
… eg a calendar year. [1⁄2] [1⁄2]
Ideally this would be done for every factor in the model and would test the interaction for statistical significance.
[1⁄2]
Significant factors will have a small error range of relativities and the error ranges for the various factors will not overlap too much. [1⁄2]

[Maximum 3]

32
Q

Question X3.4
A colleague is doing a pricing exercise, using GLMs, for a book of PMI business and has come up with an initial model containing many potential rating factors.
He is unsure whether to keep one particular factor in the model and he has asked for your advice. This factor has five levels: A, B, C, D, Unknown.
The initial model contains 50,000 observations and has 80 parameters fitted. The scaled deviance for this model is 392.45.
Your colleague has fitted a model that excludes the levels of the factor in question and the scaled deviance has now increased to 401.97.
(i) Carry out a statistical test to decide whether or not the different levels of this factor are statistically significant. Explain the rationale behind the test and state your conclusion clearly. [4]

Upper 5% point = 9.488

A

Statistical test

Define Model 1 to be the initial model and Model 2 to be the reduced model. These two models are nested, so a
chi-square test can be used to compare the changes in scaled deviance between the models. [1⁄2]

The scaled deviance for Model 1 = 392.45 (given). The scaled deviance for Model 2 = 401.97 (given).

Degrees of freedom for Model 1 = 50,000 – 80 = 49,920. [1⁄2]

Degrees of freedom for Model 2 = 50,000 – 76 = 49,924, … [1⁄2]
… since the reduced model has only 1 level for this factor instead of 5 so there are 4 fewer parameters fitted. [1⁄2]

Under the null hypothesis, there is no difference between Model 1 and Model 2:
D1* - D2* is distributed…[1/2]

The difference between the scaled deviances is 401.97 – 392.45 = 9.52. [1/2]

This should be compared with the upper 5% point of the (2; 4 ) [1/2] distribution.
The test statistic of 9.52 exceeds this value, … [1⁄2]
 distribution is 9.488 (from page 169 of the Tables).

… so, at the 5% significance level, the reduced Model 2 would be rejected in favour of the initial Model 1. [1⁄2]

Therefore, based on the statistical test, and assuming a 5% significance level, this factor would be kept in the model.
[1⁄2] [Maximum 4]

33
Q

Question X3.4
A colleague is doing a pricing exercise, using GLMs, for a book of PMI business and has come up with an initial model containing many potential rating factors.
He is unsure whether to keep one particular factor in the model and he has asked for your advice. This factor has five levels: A, B, C, D, Unknown.

The factor has been found to be only just significant, based on the 5% statistical test, so it was decided to conduct a more detailed analysis.

(ii) Discuss the further considerations you would take into account when deciding whether or not to keep this factor in the model. [5.5]

A

ii) Further considerations

It is not known what this factor is, although it is known to have five levels. An investigation would be needed as to whether these five levels are groupings of more detailed levels. [1⁄2]
If the former, then the original ungrouped factor could be included in the model instead, to test for significance …
… although this might be difficult if the factor has been grouped due to there being insufficient data. [1⁄2]
The parameter values associated with each of the five levels should be analysed to see if they are as expected, relative to each other. [1⁄2]
A graph of the values could be drawn, to enable this to be seen more clearly. [1⁄2]

For example, it could be that the relativity for the “Unknown” level is so different to the relativities for the other levels that it is this alone that is making the factor appear statistically significant. [1⁄2]
If this is the case, then the factor is not really adding much in terms of predictive power for the future. [1⁄2]
Alternatively, “Unknown” could be grouped with one of the other levels and the model refitted to see whether it is then statistically significant. [1⁄2]
An interaction between this factor and some measure of time should also be fitted, to show whether the pattern observed for the relativities is consistent over time. [1⁄2]
If the pattern is not consistent then this factor may be rejected on the basis that it does not show a stable pattern and is therefore not useful for predicting the future.
Consideration should be given to whether the factor is likely to be acceptable to policyholders …
[1⁄2] … and brokers, eg their systems may be unable to handle an extra rating factor. [1⁄2]
If this factor has been used in previous rating exercises for this book of business, it would be more likely to be kept it the model this time. [1⁄2]
If this factor has not been used before in general, then the practicalities of how easily it could be incorporated into the rating algorithms should be considered. [1⁄2]
If it is likely to take a lot of IT time to build the relevant tables then this would be an argument for not using the factor.
Consideration should also be given to whether this factor is used by other insurers in the PMI market.
[1⁄2] [1⁄2]
If this factor is dropped from our model while other insurers continue to use it then this could lead to anti-selection.
[1⁄2]
[Maximum 6]

34
Q

Notes Q18.5
Explain whether a normal distribution would be appropriate for modelling claims costs per policyholder per month for a PMI contract. [1.75]

A

There is likely to be a large number of policyholders with zero or very small claims✓✓ and a small number of people with very large claims✓, ie the true distribution will be positively skewed✓. The normal distribution does not have this property✓.
A normal distribution can also take negative values, which would be inappropriate.✓✓

35
Q

Describe the workings of a “Link Function”. [2.5]

A

The link function acts to remove the assumption that the effects of different variables must simply be added together.✓✓
It must be both differentiable✓ and monotonic✓ (either strictly increasing or strictly decreasing)✓.
Typical link functions include the log, logit and identity functions.✓✓

The log link function is of particular interest in pricing✓ because its use results in a model where the effects of different rating factors are multiplied together✓✓.