A2. Generalized Linear Models for Insurance Rating Flashcards

1
Q

Problems with one-way analysis rating

A
  1. Can be distorted by correlations between rating variables
    - youth are more concentrated in some areas => territory and age are correlated
  2. Does not consider interdependencies/interactions between rating variables
    - youth+sport is extra risky, but eldery+sport is extra careful => sport interacts with age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problems with minimum bias rating

A
  1. Lack of statistical framework to asses quality of model:
  2. 1 cannot test significance of a variable
  3. 2 no credibility ranges for parameters ( glm can provide confidence interval)
  4. Iterative calculations are computationally inefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Assumptions in classical linear/additive models

A
  1. all observations are independent
  2. observations are normally distributed
  3. each risk group has constant variance
  4. effects are additive: mean is a linear combination of covariates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 limitations of classical linear/additive models

A
  1. difficult to assert normality and constant variance of response variables
    - if Y>0 => then not normal
    - if Y>0 and E(Y) tends to 0 => then Var(Y) tends to 0 (not constant)
  2. mean is not always a linear combination of covariates
    - many insurance risks tend to vary multiplicately with rating variables
    - additive assumptions are not realistic for insurance applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Assumptions in GLMs

A
  1. all observations are independent
  2. observations distribution is from the exponential family
  3. link function is differentiable and monotonic
  4. effects may be non-linear: mean is the inverse of the link function, which is a linear combination of covariates

So no longer tie to :

  1. NORMALITY Assumption
  2. CONSTANT VARIANCE Assumption
  3. ADDITIVITY OF EFFECTS Assumption

pro : Adjusts for correlation, with less restrictive assumptions to classical linear models

con : Often difficult to explain results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to transform an additive model in a multiplicative rating plan

A

log link function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Advantages of GLMs with log link rating plan as compared to additive model

A
  • simple and practical to implement ***
  • guarantee positive premiums
  • multiplicative impact of risk characteristic more intuitive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Advantages/disadvantages of modeling frequency and severity separately

A

Advantages:
-more insight/intuition about the impact of each predictor

  • more stability of both models since a predictor affecting only frequency may be diluted in a pure premium model
  • less overfitting since a predictor affecting only frequency may also catch up the noise of severity in a pure premium model
  • frequency may not move in the same direction as severity but that is a strong assumption of Tweedie/pure premium models

Disadvantages:

  • more detailed data required => may not be available
  • need to build 2 models => more time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why coverage related variables (deductible, limit) should be first priced outside of GLMs

A
  1. Violation of Tweedie assumption: frequency and severity never move in the same direction for those variables. if higher deductible => frequency decrease, severity increase
  2. Counterintuitive results: may indicate a lower rate for higher coverage
  3. 1 if correlations with other variables outside the model
  4. 2 if adverse selection of insureds self-selecting higher deductible because they know they have higher loss potential and want to reduce the prm
  5. 3 if underwriters forcing high risk insureds to select higher deductibles

*Deductible relativities should be determined based purely on loss elimination, outside of GLM model. Then included in the GLM model as an offset in the log link function ( + ln ( relativities))

relativities = factor y / factor base level

factor = 1 - LER
not LER !!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens if coverage related variables (deductible, limit) are priced within GLMs

A
  • rates will reflect other things than pure loss elimination
  • insureds will change their behavior
  • therefore rate based on past experience (and past behaviors) will no longer be predictive of new policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why territories should be priced outside of GLMs

A
  • may be a large number of territories

- but aggregating the territories into a smaller number of groups may cause loss of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How territories should be priced

A

Step 1: Estimate territory relativities using spatial smoothing and by including the rest of the classification model in offset

Step 2: Estimate the rest of the classification plan using GLM and by including the territory relativities in offset

Iterate steps 1 and 2 until both converge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Impact of choosing a level with fewer observations as the base level of a categorical variable

A
  • higher standard error and p-value => wider confidence intervals around the estimated coefficients
  • but the predicted relativities will be the same (rebased to the chosen base level)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When using a log link function, why continuous predictor variables should usually be logged and exceptions

A

Reason:
- should log prediction to allows more flexibility in fitting different curve shapes to the data (if not logged => only allows for exponential growth)

Exceptions:

  • year variable (used to pick up trend effects)
  • variable containing values of 0 (ln(0) is undefined, unless 1 is added to all observations before taking the log)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 cautions when plotting actual vs predicted values for model selection

A
  • use holdout data to prevent overfit
  • aggregate data before plotting based on percentiles of predicted values
  • take the log of all values before plotting to prevent large values from skewing picture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

GLM outputs for each predicted coefficient

A
  1. standard error:
    - Definition: estimated std dev of random process that estimate coefficient
    - Use: p-value and confidence interval
    - Limitation: ***based of the Cramer-Rao lower bound => could be understated
  2. p-value:
    -Definition: probability of an estimated coefficient having its magnitude different than 0 by pure chance given that the true coefficient is 0
    Focus: variable significance
    - if small p value, variable shoud be included in the model
    -Limitation: does not give the probability that the true coefficient is 0
    - more observation = smaller p valur
    - small dispersion parameter = smaller p value
  3. confidence interval:
    - Definition: range of estimates that would not be rejected given a selected threshold for the p-value
    - If interval very narrowed , should add the new variable into model
    - Focus: variable significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Problem and options for GLMs with highly correlated variables

A

Problems:

  • unstable model
  • erratic coefficients
  • high standard errors

Option 1: remove all highly correlated variable except one

  • this eliminated the high correlation
  • disadvantage: potentially loses some unique information in the eliminated variables

Option 2: use dimensionality-reduction techniques (principal component analysis)

  • creates a new subset of uncorrelated variables from the correlated variables by identifying which variables are most predictive ** of the variance between classes***.
  • Allow any other highly correlated variable to be removed resulting in a simpler model.
  • use this subset of uncorrelated variables in the GLM
  • disadvantage: additional time required
  • Suited for developing individual, aggregate variables that summarize signal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define multicollinearity and give a way to detect it

A

Definition:
-two or more predictor are strongly predictive of a third predictor => near-perfect linear dependency among 3 or more predictors

Problem:

  • erratic coefficients
  • unstable model
  • model may even not converge

Detection: use variance inflation factor (VIF)
how much the squared standard error for the predictor is increased due to the presence of collinearity with other predictors. It is determined by running a linear model for
each of the predictors using all the other predictors as inputs,*
-if VIF>10 => variable has multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define aliasing and its solutions

A

Definition:
-perfect **linear **dependency among predictor variables

Problem:
-model will never converge

Solutions:

  • Manually: remove or reclassify aliased records in another factor level
  • GLM softwares: automatically remove one of the aliased variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Types of aliasing

A
  1. Intrinsic aliasing:
    - perfect dependency between 2 predictors are inherent to the definition of the variables
    - ex1: if the model includes all levels of a categorical variable, last=1-sum(others).
    - ex 2 : utilise age & birth date
  2. Extrinsic aliasing:
    - perfect dependency between 2 predictors are from the nature of the data
    - ex: all red cars of the data happen to all be 2-door sedans AND vice-versa
  3. Near-aliasing: ( same as multicollinearity )
    - almost perfect dependency between 2 or more predictors
    - ex: all red cars of the data happen to all be 2-door sedans ( but not vice versa)
    - convergence problem may occur
21
Q

Deviance residual

A

this is the amount that a given observation contributes to the overall deviance

in a well fit model, deviance residuals will
follow no predictable pattern
will be normall distributed
have constant variance

22
Q

Possible transformations after reviewing partial residual graph

A
  1. Binning into a categorical variable with separate bins
    - help differentiate the difference in residuals
    - Disadvantages:
    - increases degrees of freedom
    - can result in inconsistent/impractical pattern
    - ignores variations within bins ***
  2. Adding a polynomial terms
    - Disadvantages: loss of interpretability without a graph
  3. Adding piecewise linear/hinge functions
    Allow to track the different slope of residuals
    -Disadvantages: break points must be chosen manually(judgmental)
23
Q

3 options for measuring model stability

A
  1. Cook’s distance
    - measures** influence of an individual record **on the model
    - check records with the highest Cook’s distance if they should be excluded ( highest distance = more influence the variable has on the model)
  2. Cross-validation
    -create sub datasets with fewer number of records by sampling without replacement
    - split data into k parts and run the model on the k-1 parts , then validate the result on the last part.
    -check parameter estimates which vary the most across different model runs if they should be excluded
    - superior since more data use to train and test
    extremly time consuming and
    less common in insurance since variables are often hand-picked
  3. Bootstrapping
    - create new datasets with the same number of records by sampling with replacement
    - run model on each sampled dataset
    - check parameter means and variance after refitting model on many new datasets
24
Q

2 measures used in diagnostic tests of overall model

A
  1. Log-likelihood
    - Definition: log of the product of the likelihood for all observations
    - Lower bound: log-likelihood of null model (no predictor)
    - Upper bound: log-likelihood of saturated model (1 predictor for each observation)
  2. Deviance
    -Definition: generalized form of the SSE
    -Lower bound: 0 ( saturated model)
    -Upper bound: model has no predictor. Represent total deviance inherent in the data
    Can test between two “nested” models to see if the inclusion of the additional factor improves the model enough given the extra parameter it adds to the model
25
Q

Define underfitting and overfitting

A

Underfitting:

  • too few parameters
  • does not use enough of the useful info
  • does not capture enough of the signal

Overfitting:

  • too many parameters (or too few obs)
  • captures too much of the noise
26
Q

What is the purpose of a holdout/testing sample

A

holdout sample is a separate sample of data not used in training

if poorly predicting the holdout sample results

  1. model has been overfit to the training sample data
  2. OR has poor predictive power in general

USE :

  1. Validate how well the results of a predictive model generalize to other data:
    -To test how accurately the predictive model will perform in practice
    Test the predictive power/stability of a model
  2. Prevent overfitting:
    model will always fit the training better with more paramteters but it also could pick up random variation as a predictive variable
    -this is particularly likely when (1) the size of the training data set is small or (2) when the number of parameters in the model is large
    -make sure model has not captured noise of training during fitting procedure
27
Q

Criteria to select a holdout/testing sample

A
  1. unbiased sample from the same population as the training dataset ***
  2. large enough to fit to a model
  3. can select randomly or split from training data by time
    - if split randomly => any weather event will be in both training and test => over-optimistic validation results
    - if split using time ( even vs odd) => better since training and holdout will be equally impacted by seasonality and trends=> realistic validation results. if large waether event, the event will only be included in one of the two sets and we wont get overly optimistic results that would occur if the event affected both sets.
28
Q

Degrees of freedom for a model

A

number of parameters that need to be estimated
Model complexity is defined by its degrees of fredom.
A more complex model = model will more DF

29
Q

How is GLM estimation affected by the number of variables

A

GLM estimation:

  • maximising the log-likelihood
  • or minimizing the deviance (equivalent)

Impact of number of variables:
-adding more variables will always increase the log-likelihood (and reduce the deviance) because there is more freedom to explain randomness of outcomes from non-systematic effects.

  • since GLM maximize the log-likelihood, they will always use the additional variables
  • however these additional variables may only catch up noise of training set and therefore deteriorate model performance on testing set
30
Q

2 measures used in diagnostic tests of rating variable selection

A
  1. Beta test:
    - test if the parameter is significantly different than 0 (or relativity different than 1)
    - threshold defined by student test if using betas, or chi-square test if using relativities
  2. Deviance test:
    - test if the inclusion of the additional variable improves model significantly (or decrease deviance enough)
    - threshold defined by chi-square test if scale parameter is known, or F test if scale parameter is unknown
31
Q

Required conditions for a valid comparison of deviance between models

A

Standard conditions:

  • same number of records in datasets
  • same distribution
  • same dispersion parameter**

If we want to compare models using the Fisher test, additional condition:
-one model must be a subset of the other

32
Q

Why deviance residuals are not usefull for discrete distribution (or distribution with point mass like Tweedie)

A

does not adjust for discrete nature of those distributions

33
Q

3 strategies to test model

A
  1. Train and test:
    - split data into a single training set and a single test set
  2. Train, validate and test:
    - split data into 3: a training set, a validation set and a test set
    - validation set used to refine and tweak the model
  3. Coss-validation:

** for all strategies :
Only use test data set when model is complete
If too many decisions are made based on test set, it is effectively a training set (leads to overfitting)

34
Q

2 reasons that model refinement techniques are not appropriate for model selection

A
  • some models may be proprietary

- final model may be a business decision instead of a technical decision

35
Q

3 reasons GLMs may not be appropriate

A
  • if there is significant corr in data => aliasing problems => model will not converge
  • need to select an error structure => not clear which to use
  • if new product => may not have losses => no response variable to fit a GLM **
36
Q

2 important limitations of GLM

+ SOLUTION POUR 1ER PROBLEME

A
  1. GLMs give full credibility:
    - the estimated coefficients are not credibility-weighted to recognize low volumes or high variability
    - solution: look at standard errors and p-values **
  2. GLMs assume randomness of outcomes are uncorrelated: **
    - if several renewals of the same policy in dataset => the same insured is likely to have correlated outcomes
    - if there are weather events in dataset => the same weather event is likely to cause similar outcomes to risks in the same areas
37
Q

Steps to combine separate models by peril/coverage

A
  1. Run each peril model separately to get expected losses from each peril for the same group of exposures
  2. Aggregate the expected losees across all perils for all observations
  3. Run a model using the all-peril loss cost as the target variable and the union of all predictors as the predictors.
  4. This target variable will be more stable since volatility was fit away, therefore you can focus on using only data reflecting the future mix of business (latest year instead of all historical years)
38
Q

Why ensemble models can offer improved predictions

A

Reasons:

  • some models will over-predict for some segments of the book
  • other models will under-predict for other segments of the book
  • therefore using an average can balance the predictions for all segments

Conditions:

  • the errors of the models must be as uncorrelated as possible (otherwise they will over-predict the same segments at the same time)
  • therefore the models shoudl be build separately, by different people, without sharing information
39
Q

Considerations in data preparation when merging policy and claims data

A
  • claims matching: need to match specific vehicles or specific coverages?
  • record matching: are there timing differences between the datasets?
  • time dimension: is the level of aggregation CY or AY consistent between datasets before merging?
  • fields required/necessary: are there fields not needed or desired fields not present?
40
Q

Considerations in modifying the data

A
  • check for duplicate records => remove them
  • check for codes of categorical fields against documentation => document new codes or correct errors
  • check for reasonability of numerical fields => correct negative premiums or significant outliers
  • check for convergence problems/confusing results => handle errors/missing values by replacing them by average values/error flag/reclassify them/remove them
  • check for binning of continuous fields
41
Q

Other possible data adjustments before modeling

A
  • capping large losses
  • removing cats (or giving less weight)
  • developing losses to ultimate (or add a year variable to pick up trends/development)
  • trending exposures and losses (or add a year variable to pick up trends/development)
  • on-leveling premiums
42
Q

When to use weights in a GLM

A
  • When a single observation contains grouped information
  • Different observations represent different time periods

If neither apply, weights are all 1

43
Q

lift

A

economic value of the model

ability to prevent adverse selection **

44
Q

ROC CURVE

A

Used for logistic model ***
shows the trade off between true positive and false positive rate for different discrimination threshold.

A better model will be pushed out further from the line of equaliy meaning that a small increase in the false positive rate will yield a larger increase in the true positive rate when the treshold is lowered

45
Q

OFFSET

A
  • makes glm awares of non modeled rating factor so that they are reflected in the model and estimated coefficient for the new variable are optimal in their presence.

if there is multiple offset , can simply be added together into a total offset

46
Q

OFFSET USE

A

Offset is useful for deductible, which is better estimated outside GLMs (e.g. LER analysis), since
GLM often produces counterintuitive results due to effect of selection and correlation with
variables outside model.

Territory rating is impractical to use in a GLM since there are hundreds or even thousands of
territories with no easy way to group them without losing signal. However, territory differences
are significant so it’s important that the rating plan be offset for territory rates. Thus it’s best to
include territory factors as an offset in GLM.

If you’re creating a model on renewal business after having already made a model for new
business only, you would likely use an offset for many of the variables. This would ensure
consistency between the sets of business that you do not expect to change over time.

When including the effect of a coverage limit in a pure premium model. Limits may be correlated
with other covariates not being accounted for in the model and this might lead to inconsistent
ILFs based on model results, so it’s better to do loss elimination analysis outside of the modeling
process and include the effect of a coverage limit as an offset.

**when the target variable varies by a exposure base
exemple : claim count per policy is the tharget
policies have different policy length in years
a policy of 2 years will have an offset of ln(2)

47
Q

centering variable at their base level

A

Intercept represents all variable at their base levels ***
-> easier to interpret

Sign of interacted variable
> When variable is not centered, sometimes a coefficient may have the opposite sign than expected, this is especially true when an interaction term is present, so the coefficients are more intuitive to understand when centering variables

48
Q

Methods for Estimating Distribution Parameters

A

method of moments

maximum likelihood

minimum chi squared

minimum distance ***