18. Generalised Linear Modelling Flashcards

1
Q

Explanatory variable

A

Inputs into a model expected to influence the response variable
- in pricing context, these would be rating factors
- for example, clinical and demographic drivers

May be categorical and non-categorical
- categorical –> value of each level are distinct and cannot be given natural ordering (e.g. gender - called factors)
- non-categorical - can take on numerical values (e.g. age) - often continuous numerical variables are treated as categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Response variable

A

Outputs likely to be affected by explanatory variable
- response is the value the model is trying to predict
- in pricing context, this would be the premium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interaction terms

A

The effect of one factor varies depending on the value of another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

One-way analysis

A
  • Looks at the effect on frequency and severity of each rating factors separately
  • ignores correlation and interaction effects between variables, so may underestimate, or double count effect of variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

GLM

A
  • Generalisation of normal model for multiple linear regression
  • can be used to model the behaviour of a random variable believed to be dependent on the values of several characteristics, e.g. age, gender, chronic conditions
  • Produces estimates of true values of relativities by taking account of correlations and allowing investigation of any interactions between variables present in the model

Overcome issues with normal model for multiple linear regression
- allows the response variable to take any distribution from the exponential family
- link function introduced that acts to remove the assumption that effects of different explanatory variables must simply be added together

Exponential family of distributions
- properties - distribution completely specified by mean, variance is a function of mean
- examples: normal, Poisson, Gamma, Binomial, inverse Gaussian, exponential, Tweedie

Tweedie distribution
- point mass at zero - aligns with pure premium distribution, large spike at zero, and wide range of amounts where policies have had claims

Link function
- must be differentiable and monotonic (strictly increasing or decreasing)
- log-link function results in a model where the effects of different rating factors are multiplied together
- logit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Normal model for multiple linear regression

A
  • assumes response variable has normal distribution
  • normal distribution has constant variance, might not be appropriate
  • adds together effects of different explanatory variables, often this is not what is observed
  • becomes long-winded with more than two explanatory variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Type of GLM suited to model mortality

A
  • Logistic regression model
  • Logistic models model binary outcomes (0,1) well, and mortality is a binary outcome (dead or alive)
  • Link function would be the logit function, ln(mu/(1-mu))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Analysis of significance of explanatory variables (in explaining response)

A
  • check which variables are statistically significant against computer generated p-value
  • calculate AIC, F-tests, Chi-squared, or some other comparative measure
  • plot odds ratios, those with confidence levels above 1 are statistically significant
  • change in nested model deviance is significant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly