18. Generalised Linear Modelling Flashcards
Explanatory variable
Inputs into a model expected to influence the response variable
- in pricing context, these would be rating factors
- for example, clinical and demographic drivers
May be categorical and non-categorical
- categorical –> value of each level are distinct and cannot be given natural ordering (e.g. gender - called factors)
- non-categorical - can take on numerical values (e.g. age) - often continuous numerical variables are treated as categorical variables
Response variable
Outputs likely to be affected by explanatory variable
- response is the value the model is trying to predict
- in pricing context, this would be the premium
Interaction terms
The effect of one factor varies depending on the value of another
One-way analysis
- Looks at the effect on frequency and severity of each rating factors separately
- ignores correlation and interaction effects between variables, so may underestimate, or double count effect of variables
GLM
- Generalisation of normal model for multiple linear regression
- can be used to model the behaviour of a random variable believed to be dependent on the values of several characteristics, e.g. age, gender, chronic conditions
- Produces estimates of true values of relativities by taking account of correlations and allowing investigation of any interactions between variables present in the model
Overcome issues with normal model for multiple linear regression
- allows the response variable to take any distribution from the exponential family
- link function introduced that acts to remove the assumption that effects of different explanatory variables must simply be added together
Exponential family of distributions
- properties - distribution completely specified by mean, variance is a function of mean
- examples: normal, Poisson, Gamma, Binomial, inverse Gaussian, exponential, Tweedie
Tweedie distribution
- point mass at zero - aligns with pure premium distribution, large spike at zero, and wide range of amounts where policies have had claims
Link function
- must be differentiable and monotonic (strictly increasing or decreasing)
- log-link function results in a model where the effects of different rating factors are multiplied together
- logit
Normal model for multiple linear regression
- assumes response variable has normal distribution
- normal distribution has constant variance, might not be appropriate
- adds together effects of different explanatory variables, often this is not what is observed
- becomes long-winded with more than two explanatory variables
Type of GLM suited to model mortality
- Logistic regression model
- Logistic models model binary outcomes (0,1) well, and mortality is a binary outcome (dead or alive)
- Link function would be the logit function, ln(mu/(1-mu))
Analysis of significance of explanatory variables (in explaining response)
- check which variables are statistically significant against computer generated p-value
- calculate AIC, F-tests, Chi-squared, or some other comparative measure
- plot odds ratios, those with confidence levels above 1 are statistically significant
- change in nested model deviance is significant