Shapland Flashcards
What’s the purpose of GLM model?
to model the incremental losses q(w,d)
How to handle negative incremental value in triangle?
- If sum of the column is negative -
- subtract from every cell in the triangle by the greatest negative value
- set greatest negative cell to 0
- solve the GLM using the modified triangle
- add back to fitted incremental values the greatest negative value
- If just value is negative but the column sum is overall positive - see picture
then solve the GLM
How to calculate Standardized Residual
Residual divded by Standard Deviation (aka. standard error)
Model Result Reasonability check for Standard Error
Standard Error
- S.E. should increase from older to more recent years (bc s.e. follows the magnitude of the results)
- Total s.e. should be larger than any individual year s.e.
- Total s.e. should be less than the sum of s.e. across all AYs (since the model assumes independence btw AYs)
Model Result Reasonability check for CoV
CoV
- CoV should GENERALLY decrease when moving from oldest to most recent years
- CoV may rise in the most recent year’s due to
1. Increasing parameters will bring in higher parameter uncertainty to more recent years
2. The model may be overestimating the variability in the most recent years
- Total CoV should be smaller than any individual year’s CoV
- Total CoV should be less than the sum of CoV across all AYs
Why homoscedasticity is needed for bootstrapping
Bootstrap assumes residuals to be independent and identically distributed. Heteroscedasticity will violate this assumption since the variance of different residuals will be different
What are the solutions for heteroscedasticity?
- Stratified sampling
- Calculating Variance Params
- Calculating Scale Params
How to handle heteroscedasticity with stratified sampling
Group development periods with homogeneous variances
Sample with replacement from the groups only
Disadvantage - some groups may have limited residuals thus reduced credibility
How to handle heteroscedasticity with Calculating Variance Parameter
Group development periods with homogeneous variances
Calculate s.e. of the standardized residuals in each of the “hetero” groups
Calculate the hetero-adjustment factor hi = (all std residuals combined)/(std residuals in group i)
Multiply all residuals in group i by the hi
Sample with replacement within ENTIRE triangle. Divide the resampled residuals by hi corresponding to the cell
How to handle heteroscedasticity with scale parameters
Hetero-adj facors based on different scale params
use the ratio of SQRT(overall scale param) to SQRT(scale param by age i).
N = total cells in the triangle
p = alpha (AYs) + beta (development periods,usually alpha -1)+ #hetero adj factors
#hetero adj factors = #of hetero groups - 1
ni= # cells in group i
rw,d here is pearson (uncaled) residual, not standardized residual
Then hetero-adj factors are used the same way as in hetero-adj based on s.e. of residuals
see below for scale param for hetero group i
How to handle Exposure Change in triangle
Divide all loss data by exposure for each AY to get Pure Premium
Run model based on pure premium
apply back exposure
How to handle Heteroecthesious data in triangle (partial first development period data)
Partial first development period data -
reduce future incremental losses for the latest AY to correspond to the earned exposure
-> then simulate process variance
How to handle heteoecthesious data (partial last calendar period data)
Partial last calendar period data -
Annualize the last triangle so that they're in line with the rest of the triangle Calculate the fitted triangle and residuals During ODP bootstrap simulation, calculate and interpolate LDFs from the fully annualized sample triangles De-annualize last triangle Project future values by multiplying the interpolated LDFs with the new cumul values Reduce future incr values for the latest AY to remove future exposure
Formula for unscaled Pearson unscaled residual
General ODP Bootstrap
- calculate the age-to-age factors
- calculate the fitted cumulative losses by starting with the latest diagonal and backup the triangle
- calculated the fitted incremental losses
- calculate the actual incr loss
- calculate the Pearson residuals
- Calculate the hat matrix adj factors
- calculate the standardized residuals
- randomly sample from the standardized residuals with replacement
- convert the random standardized residuals into sample incremental losses
- use the sample incremental losses to create a triangle of sample cumulative losses
- project the sample cumul losses to ultimate
GLM model setup
- set up below graph
- once set up, fit the model to incr loss triangle using iterated least squares or maximum likelihood
Formula for unscaled pearson residuals
note that z=1 for poisson (most of the time)
Formula for standardized residuals
The power z is in estimated variance for each distribution
Possion z=1
Gamma z=2
Inverse Gaussian z =3
Ways to handle Outliers
- if extreme values exist in the original triangle, we can remove the impact from the model
- if using the ODP bootstrap model, use below
- exclude the outliers completely, treat as missing value
- exclude the outliers from ATA factors and residual calculations, but include the outlier cells during the resample triangle projection process
Three options to remove outliers when calculating ATA factors
- exclude the row only if the outlier is in the numerator
- exclude the row only if the outlier in the denominator
- exclude the row if outlier is in either the numerator or the denominator
What do we do if significant amount of outliers?
May indicate poor fit of model
For GLM bootstrap, choose new params/change the distribution
For ODP bootstrap, L-yr wtd avg can be used to provide a better model fit, but if skewness is real, then the bootstrap will keep it
how to handle missing values?
GLM Bootstrap Model -
Missing data simply reduces the number of observations in the data
ODP Bootstrap Model -
estimate from surrounding values
or, modify LDFs to exclude missing values
Solution 1: estimate missing values from surrounding values Solution 2: modify LDFs to exclude the missing value, no residual for missing value -> don’t resample from the missing value Solution 3: if missing value is on the latest diagonal, estimate value/ use value in the 2nd to last diagonal
Negative Values during simulation of process variance (aka. mw,d is negative), how to handle?
Option 1 :
- change the sign of simulated value
Option 2:
- shift the entire distribution to have a mean of mw,d