Chapter 14 (GLM) Flashcards

1
Q

Explanatory variables.

A

These are inputs to the model affect the response variables.
In a pricing context, the explanatory variables would typically be rating factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Response variables.

A

These are outputs from the model that are likely to be affected by the explanatory variables.
In an overall pricing context, the response variable would be the price.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical variables

A

These are explanatory variables that are used for modelling where the values of each level are distinct, and often cannot be given any natural ordering or score. An example of this would be chronic status, which can take the value of yes or no.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Non-categorical variables

A

These are explanatory variables that can take numerical values, for example, age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interaction term

A

This is used where the pattern in the response variable is better modelled by including extra parameters for each combination of two of more factors. An interaction exists when the effect of one factor varies depending on the value of another factor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe a one-way analysis and state its shortcomings. []

A

This is where GLMs are used to look at the effect on frequency and severity of each rating factor separately. Note that a one-way analysis ignores correlations and interaction effects between variables, for example, age and disease, age and family size, or maternity and gender. As a result, the model may underestimate or double count the effects of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Link Function

A

Link function. This acts to remove the assumption that the effects of different variables must simply be added together. Instead, it defines a more complex yet appropriate relationship between the explanatory and response variables. It must be both differentiable and monotonic, either strictly increasing or strictly decreasing. Typical link functions include the log, logit, and identity functions. The log link function is of particular interest in pricing because its use results in a model where the effects of different rating factors are multiplied together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Link Function

A

Link function. This acts to remove the assumption that the effects of different variables must simply be added together. Instead, it defines a more complex yet appropriate relationship between the explanatory and response variables. It must be both differentiable and monotonic, either strictly increasing or strictly decreasing. Typical link functions include the log, logit, and identity functions. The log link function is of particular interest in pricing because its use results in a model where the effects of different rating factors are multiplied together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MLE

A

This refers to the statistical approach of estimating population parameters such as the mean and variance. The approach uses sample data together and estimates the parameter values that maximise the probability of obtaining the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

GLM vs linear regression

A

GLM is a generalised form of linear regression models. It is more flexible than linear regression because a GLM can still work even when the response variables are not continuous or unbounded. Furthermore, a GLM allows unconstrained inputs—inputs can take any value—to affect the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

GLM vs linear regression

A

GLM is a generalised form of linear regression models. It is more flexible than linear regression because a GLM can still work even when the response variables are not continuous or unbounded. Furthermore, a GLM allows unconstrained inputs—inputs can take any value—to affect the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions of classical linear models

A

Assumptions of classical linear models include:
l The error terms are independent and come from a normal distribution, and error terms are described below.
l The mean is a linear combination of the explanatory variables. l The error terms have constant variance, or homoscedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The normal model for multiple linear regression has a number of drawbacks. [ ]

A

l It assumes that the response variable, Y, has a normal distribution, which may not be appropriate for the variable being modelled.

l The normal distribution has a constant variance, which may not be appropriate for the variable being modelled. For example, the variance of claim numbers tends to increase as the expected value increases. The Poisson distribution has this property, so would be a more sensible choice.
l The normal model for multiple linear regression adds together the effects of the different explanatory variables, but this is seldom what is observed in practice. For example, the effects of ‘age’ and ‘family size’ might be multiplicative, rather than additive: if large families have three times as many claims as small ones, and old people have four times as many claims as young people, it might be expected that the combination of ‘old’ and ‘large family size’ results in 12 times as many claims. Indeed, this is often what is observed in practice.
l With more than two explanatory variables, a manual solution becomes increasingly long-winded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The normal model for multiple linear regression has a number of drawbacks. [ ]

A

l It assumes that the response variable, Y, has a normal distribution, which may not be appropriate for the variable being modelled.

l The normal distribution has a constant variance, which may not be appropriate for the variable being modelled. For example, the variance of claim numbers tends to increase as the expected value increases. The Poisson distribution has this property, so would be a more sensible choice.
l The normal model for multiple linear regression adds together the effects of the different explanatory variables, but this is seldom what is observed in practice. For example, the effects of ‘age’ and ‘family size’ might be multiplicative, rather than additive: if large families have three times as many claims as small ones, and old people have four times as many claims as young people, it might be expected that the combination of ‘old’ and ‘large family size’ results in 12 times as many claims. Indeed, this is often what is observed in practice.
l With more than two explanatory variables, a manual solution becomes increasingly long-winded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Chapter question 6
Explain whether a normal distribution would be appropriate for modelling claims costs per policyholder per month for a PMI contract.

A

Chapter solution 6 There is likely to be a large number of policyholders with zero or very small claims and a small number of people with very large claims, i.e. the true distribution will be positively skewed. The normal distribution does not have this property. A normal distribution can also take negative values, which would be inappropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Chapter question 6
Explain whether a normal distribution would be appropriate for modelling claims costs per policyholder per month for a PMI contract.

A

Chapter solution 6 There is likely to be a large number of policyholders with zero or very small claims and a small number of people with very large claims, i.e. the true distribution will be positively skewed. The normal distribution does not have this property. A normal distribution can also take negative values, which would be inappropriate.

17
Q

GLMs address some of these problems.

A

GLMs address some of these problems. They generalise the normal model for multiple linear regression as follows:
1. The response variable can take any distribution from the exponential family; it no longer has to take a normal distribution.
link function is introduced. This acts to remove the assumption that the effects of different variables must simply be added together.
3. Additionally, an offset term is included within the linear predictor

18
Q

Define GLM

A

7.2 Generalised linear models
A GLM is a flexible generalisation of linear regression. Generalised linear models are used to assess and quantify the relationship between a response (dependent) variable and a set of possible explanatory (independent) variables. The relationship between these variables is defined via the link function. In other words, a GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several characteristics, for example, age, gender, and chronic condition.

19
Q

Define GLM

A

7.2 Generalised linear models
A GLM is a flexible generalisation of linear regression. Generalised linear models are used to assess and quantify the relationship between a response (dependent) variable and a set of possible explanatory (independent) variables. The relationship between these variables is defined via the link function. In other words, a GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several characteristics, for example, age, gender, and chronic condition.

20
Q

These kinds of models can be used in a number of applications. [ ]

A

These kinds of models can be used in a number of applications for PMI including risk modelling, pricing, financial projections, and overall modelling of the business.
In practice, there are a number of software packages that enable actuaries and underwriters to calculate frequencies, average claim costs, and burning costs to use as a basis for setting future premium rates when working with GLMs.

21
Q

These kinds of models can be used in a number of applications. [ ]

A

These kinds of models can be used in a number of applications for PMI including risk modelling, pricing, financial projections, and overall modelling of the business.

22
Q

These kinds of models can be used in a number of applications. [ ]

A

These kinds of models can be used in a number of applications for PMI including risk modelling, pricing, financial projections, and overall modelling of the business.

23
Q

The steps used to obtain the predicted values from a simple GLM, from Anderson et al., 2007

A

The steps used to obtain the predicted values from a simple GLM, from Anderson et al., 2007, are as follows:
1. Specify the design matrix X and the vector of parameters β. 2. Choose the distribution for the response variable and the link function. 3. Identify the log-likelihood function.
4. Take the logarithm to convert the product of many terms into a sum. 5. Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter, setting them to zero and solving the resulting system of equations.
6. Compute the predicted values.

24
Q

Exp distrs

A

The normal, Poisson, gamma, binomial, inverse Gaussian, and exponential distributions all belong to

25
Q

Exp distrs

A

The normal, Poisson, gamma, binomial, inverse Gaussian, and exponential distributions all belong to

26
Q

Properties of exp distr

A
  1. The distribution is completely specified in terms of its mean and variance. 2. The variance of Yi
    is a function of its mean. In other words, the variance of the response
    often increases in line with the mean in some way. For example, it may be roughly proportional to the mean or to the mean squared. This addresses one of the drawbacks of the normal model for multiple linear regression.
    Info
27
Q

Properties of exp distr

A
  1. The distribution is completely specified in terms of its mean and variance. 2. The variance of Yi
    is a function of its mean. In other words, the variance of the response
    often increases in line with the mean in some way. For example, it may be roughly proportional to the mean or to the mean squared. This addresses one of the drawbacks of the normal model for multiple linear regression.
    Info