Chapter 18 Generalised Linear Modelling Flashcards

Question 1

Q

What is an explanatory variable?

Answer

A

input into a model that is expected to influence the response variable.
i.e. rating factor

-it is important that explanatory variables make intuitive sense.

Question 2

Q

What is a response variable?

Answer

A

output variable from a model is likely to be influenced by an explanatory variable.
ie price

Question 3

Q

What is a categorical variable?

Answer

A

These are explanatory variables which are discrete and distinct, often cannot be given any natural ordering score.
Eg gender

Question 4

Q

What is non-categorical variable?

Answer

A

-can take numerical values eg age.

Question 5

Q

What is an interaction term?

Answer

A

-Used where the pattern in response variable is better modelled by including an extra parameter for each combination of two or more factors.

Question 6

Q

One-way analysis merits

Answer

A

prior to use of GLMs the effect of frequency and severity of each rating factor was considered separately.
This one-way analysis ignores correlations and interaction effects between variables and so may underestimate or double count the effects of variables.

Question 7

Q

Uses of GLMs

Answer

A

A GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several other characteristics eg age, sex, chronic condition.
It is a generalisation of the normal model for multiple linear regression.

Question 8

Q

What are the drawbacks for the normal model for multiple linear regression?

Answer

A

it assumes the response variable has a normal distribution
the normal distribution has a constant variance which may not be appropriate
it adds together the effects of different explanatory variables, but is often not what is observed
it becomes long-winded with more than two explanatory variables.

Question 9

Q

Assumptions of classical linear models

Answer

A

error term are independent and come from a normal distribution
the mean is a linear combination of the explanatory variables
the error terms have constant variance (or homoscedasticity)

Question 10

Q

What are the two properties of any member of the exponential family?

Answer

A

the distribution is completely specified in terms of its mean and variance.
the variance is a function of its mean

Question 11

Q

What is the link function?

Answer

A

the link function acts to remove the assumption that the effects of different variables must simply be added together.
it must be both differentiable and monotonic.
include:log, logit & identity functions.

Question 12

Q

Steps for obtaining predicted values from a single GLM

Answer

A

Specify design matrix X and the vector of parameters Beta
Choose a distribution for the response variable and the link function.
Identify the log-likelihood function
Take logarithm to convert the product of many terms into a sum
Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter.
Compute predicted values.

Question 13

Q

What techniques are used to analyse significance of explanatory variables?

Answer

A

chi-squared test (off models nested and scale parameter is known)
the F-statistic - models need to be nested for this to work ( if models nested Andy scale parameter is unknown)
Akaike Criterion Information - appropriate where models are not nested.
other methods (over time our relationship with other factors)

Question 14

Q

Define degrees of freedom

Answer

A

-number of observations - number of parameters

Question 15

Q

AIC formula

Answer

A

AIC = -2 * log likelihood + 2* number of parameters

the lower the AIC the better the model.
fewer parameters is better/parsimonious model.

Question 16

Q

Measuring uncertainty in the estimators of the model parameters

Answer

A

The cramer-rao lower bound is used.
the maximum likelihood estimator theta-hat is distributed N(theta,CRLB).
standard errors in a GLM will be found using the Hessian matrix.
this is a matrix of 2nd derivatives.

Question 17

Q

What other ways can be used to test significance?

Answer

A

Comparisons with time
Consistency checks with other factors

Question 18

Q

Comparisons with time

Question 19

Q

Consistency checks with other factors

Answer

A

time is not the only factor that can be used as a consistency check.
eg an explanatory variable like age would be expected to show the same pattern regardless of geopraphical region.

Question 20

Q

Testing the appropriateness of models

Answer

A

The hat matrix is one of the outputs of the model-fitting process.
It is the matrix H such that y-Hat = Hy
For Normal multiple linear regression model.
The diagonal entries, h(i,i) of the matrix are called leverages. h(i,i) in interval (0,1).
Leverages measure the influence that each observed value has on the fitted value for that observation.
Data points with high leverages or residuals may distort the outcome and accuracy of a model.

Question 21

Q

Deviance residuals

Answer

A

This is the measure of the distance between the actual observation and the fitted value.
deviance corrects the skewness of the distribution.

Question 22

Q

Standardised Pearson residuals

Answer

A

A standardised residual is the difference between the observed response and the predicted value, adjusted for the standard deviation of the predicted value(y_hat adj.) and the leverage of the observed response(y_i adj.).
These adjustments make it possible to compare Standardised Pearson residuals even where observations have different means.

Question 23

Q

Residual Plots

Answer

A

-For a particular method if the distribution chosen for the response variable is appropriate then the residuals chart should produce residuals that:

are symmetrical about the x-axis
have an average residual of zero
are fairly constant across the width of the fitted values

Question 24

Q

Cook’s distance and leverage

Answer

A

Cook’s distance is used to estimate the influence of a data point on the model results.
Data points with a Cook’s distance of 1 or more are considered to merit closer examination in the analysis.
As a result of the investigation into any data points with a high Cook’s distance, decision might be made to remove the observations altogether.

Question 25

Q

Models may be refined using (4)?

Answer

A

Interactions
Aliasing
Restrictions
Smoothing

Question 26

Q

Model refinement: Interactions

Answer

A

After choosing a structure of the model and checked that it is appropriate for the factors chosen the model can be refined further.
The may be complete or marginal.

Question 27

Q

Model refinement: Aliasing

Answer

A

Aliasing occurs when there is a linear dependency among the observed covariates X1,….,Xp.
That is, 1 covariate can be expressed as a linear combination of other covariates
eg X3= 5 +2X1+ 3X2

Question 28

Q

There are two types of aliasing

Answer

A

Intrinsic

- Extrinsic

Question 29

Q

Intrinsic aliasing

Answer

A

Occurs because of the dependencies inherent within the definition of covariates.
arise mostly when categorical variables are included
This is dealt by modelling software.

Question 30

Q

Extrinsic aliasing

Answer

A

-Occurs when two or more factors contain levels that are perfectly correlated.

Question 31

Q

Near aliasing

Answer

A

When modelling in practice, a common problem occurs when two or more factors contain levels that are almost, but not quite, perfectly correlated.
In order to understand problems where model suggest very large negative/positive parameters 2 way tables of exposure and claim counts.
From this it should be possible to identify combinations that cause near aliasing.
The issue can then be resolved either deleting or excluding those rogue records or reclassifying the rogue records into another, more appropriate, factor.

Question 32

Q

Parameter smoothing

Answer

A

A GLM can be improved by smoothing the parameter values. This can be achieved by grouping level factors.
The granularity of data can be kept in modeling since softwares can use this granularity & patterns to better group the different levels of variables.

Question 33

Q

Factors can be simplified by?

Answer

A

Grouping and summarising data prior to loading. Requires knowledge of expected patterns.
Grouping in the modelling package. eg grouping age into two age bands.

Question 34

Q

Restrictions when pricing: GLM restrictions

Answer

A

Legal or commercial considerations may impose rigid restrictions on the way particular factors are used in practice.
eg legal: restricting use of age and gender in pricing for medical scheme.
When the use of certain factors is restricted the model will be able to compensate for this to an extent for this artificial restriction by adjusting the fitted relativities for correlated factors.
this is achieved by the offset term.

Question 35

Q

Restrictions when pricing: Further restrictions

Answer

A

theoretical risk premium results from a GLM claims analysis will differ from the rates implemented in practice since consideration needs to be given to price demand elasticity and competitive situation.
price elasticity demand is a measure of how the demand changes to a change in price.
The competitive situation is relevant because there may be pockets of business where theoretical risk premium rates would produce a market premium that are much higher or lower than those of your competitors.
eg if your premiums are much lower than those of you competitors then there may be an opportunity to increase your rates and still be cheapest in the market, thereby increase profits without decreasing volumes.

Chapter 18 Generalised Linear Modelling Flashcards

Describe the principal modelling techniques appropriate to health and care insurance: