Shapland Flashcards
What’s the purpose of GLM model?
to model the incremental losses q(w,d)
How to handle negative incremental value in triangle?
- If sum of the column is negative -
- subtract from every cell in the triangle by the greatest negative value
- set greatest negative cell to 0
- solve the GLM using the modified triangle
- add back to fitted incremental values the greatest negative value
- If just value is negative but the column sum is overall positive - see picture
then solve the GLM
How to calculate Standardized Residual
Residual divded by Standard Deviation (aka. standard error)
Model Result Reasonability check for Standard Error
Standard Error
- S.E. should increase from older to more recent years (bc s.e. follows the magnitude of the results)
- Total s.e. should be larger than any individual year s.e.
- Total s.e. should be less than the sum of s.e. across all AYs (since the model assumes independence btw AYs)
Model Result Reasonability check for CoV
CoV
- CoV should GENERALLY decrease when moving from oldest to most recent years
- CoV may rise in the most recent year’s due to
1. Increasing parameters will bring in higher parameter uncertainty to more recent years
2. The model may be overestimating the variability in the most recent years
- Total CoV should be smaller than any individual year’s CoV
- Total CoV should be less than the sum of CoV across all AYs
Why homoscedasticity is needed for bootstrapping
Bootstrap assumes residuals to be independent and identically distributed. Heteroscedasticity will violate this assumption since the variance of different residuals will be different
What are the solutions for heteroscedasticity?
- Stratified sampling
- Calculating Variance Params
- Calculating Scale Params
How to handle heteroscedasticity with stratified sampling
Group development periods with homogeneous variances
Sample with replacement from the groups only
Disadvantage - some groups may have limited residuals thus reduced credibility
How to handle heteroscedasticity with Calculating Variance Parameter
Group development periods with homogeneous variances
Calculate s.e. of the standardized residuals in each of the “hetero” groups
Calculate the hetero-adjustment factor hi = (all std residuals combined)/(std residuals in group i)
Multiply all residuals in group i by the hi
Sample with replacement within ENTIRE triangle. Divide the resampled residuals by hi corresponding to the cell
How to handle heteroscedasticity with scale parameters
Hetero-adj facors based on different scale params
use the ratio of SQRT(overall scale param) to SQRT(scale param by age i).
N = total cells in the triangle
p = alpha (AYs) + beta (development periods,usually alpha -1)+ #hetero adj factors
#hetero adj factors = #of hetero groups - 1
ni= # cells in group i
rw,d here is pearson (uncaled) residual, not standardized residual
Then hetero-adj factors are used the same way as in hetero-adj based on s.e. of residuals
see below for scale param for hetero group i
How to handle Exposure Change in triangle
Divide all loss data by exposure for each AY to get Pure Premium
Run model based on pure premium
apply back exposure
How to handle Heteroecthesious data in triangle (partial first development period data)
Partial first development period data -
reduce future incremental losses for the latest AY to correspond to the earned exposure
-> then simulate process variance
How to handle heteoecthesious data (partial last calendar period data)
Partial last calendar period data -
Annualize the last triangle so that they're in line with the rest of the triangle Calculate the fitted triangle and residuals During ODP bootstrap simulation, calculate and interpolate LDFs from the fully annualized sample triangles De-annualize last triangle Project future values by multiplying the interpolated LDFs with the new cumul values Reduce future incr values for the latest AY to remove future exposure
Formula for unscaled Pearson unscaled residual
General ODP Bootstrap
- calculate the age-to-age factors
- calculate the fitted cumulative losses by starting with the latest diagonal and backup the triangle
- calculated the fitted incremental losses
- calculate the actual incr loss
- calculate the Pearson residuals
- Calculate the hat matrix adj factors
- calculate the standardized residuals
- randomly sample from the standardized residuals with replacement
- convert the random standardized residuals into sample incremental losses
- use the sample incremental losses to create a triangle of sample cumulative losses
- project the sample cumul losses to ultimate
GLM model setup
- set up below graph
- once set up, fit the model to incr loss triangle using iterated least squares or maximum likelihood
Formula for unscaled pearson residuals
note that z=1 for poisson (most of the time)
Formula for standardized residuals
The power z is in estimated variance for each distribution
Possion z=1
Gamma z=2
Inverse Gaussian z =3
Ways to handle Outliers
- if extreme values exist in the original triangle, we can remove the impact from the model
- if using the ODP bootstrap model, use below
- exclude the outliers completely, treat as missing value
- exclude the outliers from ATA factors and residual calculations, but include the outlier cells during the resample triangle projection process
Three options to remove outliers when calculating ATA factors
- exclude the row only if the outlier is in the numerator
- exclude the row only if the outlier in the denominator
- exclude the row if outlier is in either the numerator or the denominator
What do we do if significant amount of outliers?
May indicate poor fit of model
For GLM bootstrap, choose new params/change the distribution
For ODP bootstrap, L-yr wtd avg can be used to provide a better model fit, but if skewness is real, then the bootstrap will keep it
how to handle missing values?
GLM Bootstrap Model -
Missing data simply reduces the number of observations in the data
ODP Bootstrap Model -
estimate from surrounding values
or, modify LDFs to exclude missing values
Solution 1: estimate missing values from surrounding values Solution 2: modify LDFs to exclude the missing value, no residual for missing value -> don’t resample from the missing value Solution 3: if missing value is on the latest diagonal, estimate value/ use value in the 2nd to last diagonal
Negative Values during simulation of process variance (aka. mw,d is negative), how to handle?
Option 1 :
- change the sign of simulated value
Option 2:
- shift the entire distribution to have a mean of mw,d
Advantages of bootstrap model
Generates a distribution of possible outcomes as opposed to a single point estimate
— provides more info o potential results, can be used for capital modeling
Can be modified to the statistical features of data under analysis
Preserve the original data distribution (skewness etc)
Reasons for more focus by actuaries on unpaid claims distributions
SEC is looking for more reserving risk information from publicly traded companies
Major rating agencies have dynamic risk models for rating and welcome input from company actuaries about reserve distributions
Companies use dynamic risk models for internal risk management and need unpaid claim distributions
ODP Model overview
Incremental claims q(w,d) are modeled directly using a GLM
GLM structure:
Log link
Over-dispersed Poisson error distribution
Steps
1) Use the model to estimate parameters
2) Use bootstrapping (sampling residuals with replacement) to estimate the total distribution
Simplified GLM model
Fitted (expected) incrementals using a Poisson error distribution are the same as incremental losses using volume-weighted average LDF
Simplified GLM Method
1) use cumulative claim triangle to calculate LDF
2) Develop losses to ultimate
3) Calculate the expected cumulative triangle
4) Calculate the expected incremental triangle from the cumulative triangle
Bootstrapping BF and Cape Cod Models
With ODP bootstrap model, iterations for the latest few accident years can result in more variance than expected
BF Method
Incorporate BF model by using a priori loss ratios for each AY with standard deviations for each loss ratio and an assumed distribution
During simulation, for each iteration simulate a new a priori loss ratio
Cape Cod Method
Apply the Cape Cod algorithm to each iteration of the bootstrap model
Generalizing the ODP Model: pros/cons of using fewer parameters
Pros
Help avoid potential over-parametrizing the model
Allows the ability to add parameters for calendar-year trends
Can be used to model data shapes other than data in triangle form
Cons
GLM must be solved for each iteration of the bootstrap model, slowing simulations
The model is no longer directly explainable to others using age-to-age factors
Options to address negative incremental values during simulation
1) Remove extreme iterations from results
2) Recalibrate the model after identifying the sources of negative incrementals
Eg remove a row with sparse data when the product was first written
3) Limit incremental losses to zero
Replace negative incrementals in original data with zero
How to treat non-zero sum of residual
Residuals calculated in a bootstrap model are just error terms, so they should be identically distributed with mean of zero (although not the case usually)
Non zero doesn’t mean incompatible with the true distribution
But to set the errors to zero, can add a constant from all residuals
Using N year wtd average
With GLM:
Exclude the first few diagonals and only use N+1 diagonal to parameterize the model
Run bootstrap simulations and only sample residuals for the trapezoid that’s used to parameterize the model
With simplified GLM
Calculate N year avg LDFs
Run bootstrap simulation, sampling residuals for the entire triangle in order to calculate cumulative values
Use N year avg factors to project future expected values for each iteration
Pros and Cons of adjusting heteroscedasticity using hetero-adjustment factors
Pro: can resample with replacement from entire triangle now
Cons: adds parameters, affecting degrees of freedom and scale parameter
Exposure adjustments
Issue: Exposures changed significantly over the years (ie rapidly growing line or line in runoff)
Adjustment
If earned exposures exist, divide all claims data by exposures for each accident year to run the model with pure premiums After the process variance step, multiply back by AY exposures to get total claims
Parametric Bootstrapping
Purpose:
Parametric bootstrapping is a way to overcome a lack of extreme residuals in a ODP bootstrapping model
Steps
Fit a parameterized distribution to the residuals
Resample residuals from the distribution instead of observed residuals
Purposes of Bootstrap Diagnostics
Test the assumption in the model
Gauge the quality if the model to fit the data
Help guide adjustments of the model parameters to improve the fit of the model
Purpose:
Find a set of models and parameters that results in the most realistic and most consistent simulations based on the statistical features of the data
Residual graphs examples that help testing the assumption of IID
Residuals v Development Period -> look for heteroscedasticity
Residuals v Accident Period
Residuals v Payment Period
Residuals v Predicted
AIC and BIC formulas
Smaller values indicate better fit
More params penalized
How to identify Outliers
Box-whisker plot
Whisker extend to the largest values within 3 times the inter-quartile range
Values outside of whisker is outlier
Reviewing estimated unpaid model results
Standard error should increase from the oldest to most recent years
Standard error for all years should be larger than any individual year
CoV should decrease from oldest to most recent years due to independence in incremental payment stream
If not, it may due to
- Increasing parameter uncertainty in most recent years
- Model may overestimate uncertainty in recent years, we may want to switch to BF/CC
Min/Max simulations should be reasonable
Methods for combining results of multiple models
Run models with the same random variables:
1) Simulate r.v. for each iteration
2) Use same set of r.v. for each model
3) Use model weights to weight incremental values from each model for each iteration by accident year
Run models with independent random variables
1) Run each model separately w different r.v.
2) Use weights to randomly select a model for each iteration by accident year so that the result is a weighted mixture of models
Estimated Cash Flow results
Simulation of unpaid losses by calendar year have the following characteristics:
S.E. Of CY unpaid decrease as CY increase in the future
CoV increases as CY increases
Estimated Ult LR results
Estimated ult loss ratios by AY are calculated using all simulated values, not just the future unpaid
Represents the complete variability in LR for each AY
LR distributions can be used for projecting pricing risk
Issues with correlation methods
Both location-mapping and re-sorting methods use residuals of incremental future losses to correlate segments
Both tend to create overall correlations of close to zero
For reserve risk, the correlation that is desired is between total unpaid amounts for two segments so there may be a disconnect
Correlation between segments: Location Mapping
For each iteration, sample the residuals from the residual triangle using the same locations for all segments
Advantages:
Method is easily implemented -> doesn’t require an estimate
Disadvantages:
All segments need to have the same size data triangles with no missing data
Correlation of original residuals is used, so we can’t test other correlation assumptions