Shapland Flashcards
Advantages of a Bootstrap Model
- Generates a distribution of possible outcomes as opposed to a single
point estimate
→ Provides more info of potential results; can be used for capital
modeling - Can be modified to the statistical features of the data under analysis
- Can reflect the fact that insurance loss distributions are generally
skewed right. This is because the sampling process doesn’t require a
distribution assumption.
→ Model reflects the level of skewness in the underlying data
Reasons for more focus by actuaries on unpaid claims distributions
- SEC is looking for more reserving risk information from publicly traded
companies - Major rating agencies have dynamic risk models for rating and welcome
input from company actuaries about reserve distributions - Companies use dynamic risk models for internal risk management and
need unpaid claim distributions
ODP Model Overview
- Incremental claims q(w,d) are modeled directly using a GLM
- GLM structure:
- log link
- Over-dispersed Poisson error distribution
Steps
1) Use the model to estimate parameters
2) Use bootstrapping (sampling residuals with replacement) to estimate
the total distribution
ODP GLM Model
GLM Model Setup (3x3 triangle)
GLM Model:
Solving for Weight Matrix
Solve for the α and β parameters of the Y = X × A matrix equation that
minimizes the squared difference between the vector of the log of actual
incremental losses (Y) and the log of expected incremental losses (Solution
Matrix).
Use the Maximum Likelihood or the Newton-Raphson method.
GLM Model:
Fitted Incrementals
Simplified GLM Method
Fitted (expected) incrementals using a Poisson error distribution are the
same as incremental losses using volume-weighted average LDFs.
Simplified GLM Method
1) Use cumulative claim triangle to calculate LDFs
2) Develop losses to ultimate
3) Calculate the expected cumulative triangle
4) Calculate the expected incremental triangle from the cumulative
triangle
Advantages of the Simplified GLM Framework
1) GLM can be replaced with the simpler link ratio approach while still
being grounded in the underlying GLM framework
2) Using age-to-age ratios serves as a “bridge” to the deterministic
framework and allows the model to be more easily explained to others
3) We can still use link ratios to get a solution if there are negative
incrementals, whereas the GLM with a log link might not have a
solution
Unscaled Pearson residual
Scale Parameter
The assumption about residuals necessary for bootstrapped samples
Residuals are independent and identically distributed
Note:
No particular distribution is necessary. Whatever distribution the residuals
have will flow into the simulated data.
Sampled incremental loss for a bootstrap model
Standardized Pearson Residuals
Process to create a distribution of point estimates
Adding process variance to
future incremental values in a bootstrap model
Sampling Residuals
Standardized Pearson scale parameter
Bootstrapping BF and Cape Cod Models
With ODP bootstrap model, iterations for the latest few accident years can
result in more variance than expected.
BF Method
* Incorporate BF model by using a priori loss ratios for each AY with
standard deviations for each loss ratio and an assumed distribution
* During simulation, for each iteration simulate a new a priori loss ratio
Cape Cod Method
* Apply the Cape Cod algorithm to each iteration of the bootstrap model
Generalizing the ODP Model:
Pros/cons of using fewer parameters
Pros
1) Helps avoid potentially over-parameterizing the model
2) Allows the ability to add parameters for calendar-year trends
3) Can be used to model data shapes other than data in triangle form
→ e.g. missing incrementals in first few diagonals
Cons
1) GLM must be solved for each iteration of the bootstrap model, slowing
simulations
2) The model is no longer directly explainable to others using age-to-age
factors
Negative Incremental Values:
Modified log-link
Negative Incremental Values:
Negative Development Periods
Negative Incremental Values:
Simplified GLM Adjustments
Negative values during simulation:
Process Variance
The problem with negative incremental
values during simulation
Negative incrementals may cause extreme outcomes for some iterations.
Example
They may cause cumulative values in an early development column to sum
to near zero and the next column to be much larger.
→ This results in extremely large LDFs and central estimates for an iteration.
Options to address negative incremental
values during simulation
1) Remove extreme iterations from results
→ BUT only remove truly unreasonable iterations
2) Recalibrate the model after identifying the sources of negative incrementals
→ e.g. remove a row with sparse data when the product was first written
3) Limit incremental losses to zero
→ Replace negative incrementals in original data with a zero incremental loss
Non-zero sum of residuals
Using a N-Year Weighted Average
With GLM Framework
* Exclude the first few diagonals and only use N+1 diagonals to
parameterize the model (data is now a trapezoid)
* Run bootstrap simulations and only sample residuals for the trapezoid
that’s used to parameterize the model
With Simplified GLM
* Calculate N-year average LDFs
* Run bootstrap simulation, sampling residuals for the entire triangle in
order to calculate cumulative values
* Use N-year average factors to project future expected values for each
iteration
Handling Missing Values
Examples: Missing the oldest diagonals (if data was lost) or missing values
in the middle of the triangle
Calculations affected:
LDFs, fitted triangle (if missing latest diagonal), residuals, deg. of freedom
Solution 1: Estimate missing value from surrounding values
Solution 2: Modify LDFs to exclude the missing value, no residual for
missing value → Don’t resample from missing values
Solution 3: If missing value is on latest diagonal, estimate value or use value
in 2nd to last diagonal to get filled triangle, using judgment
Handling Outliers
There may be outliers that are not representative of the variability of the
dataset in the future, so we may want to remove them.
- Outliers could be removed and treated as missing values
- Identify outliers and exclude from LDFs and residual calculations, but
resample the corresponding incremental when simulating triangles
Remove outliers cautiously and only after understanding the data.
Heteroscedasticity
Heteroscedasticity
When Pearson residuals have different levels of variability at different ages.
Why heteroscedasticity is a problem:
ODP bootstrap model assumes standardized Pearson residuals are IID.
* With heteroscedasticity, we can’t take residuals from one development
period and use them in other development periods
Considerations when assessing heteroscedasticity:
* Account for the credibility of the observed data
* Account for the fact that there are fewer residuals in older dev. periods
Adjusting for Heteroscedasticity:
Stratified Sampling
Option 1: Stratified Sampling
1) Organize development periods by groups with homogenous variances
2) For each group: Sample with replacement only from the residuals in
that group
BUT: Some groups only have a few residuals in them, which limits the
amount of variability in possible outcomes
Adjusting for Heteroscedasticity:
Standard Deviation
Adjusting for Heteroscedasticity:
Scale Parameter
Pros and Cons of adjusting for heteroscedasticity using hetero-adjustment
factors
Pro
Can resample with replacement from the entire triangle
Con
Adds parameters, affecting Degrees of Freedom and scale parameter
Heteroecthesious Data:
Partial first development period
This occurs when the first development period has a different exposure
period length than other columns.
→ e.g. 6 months in the first column, then 12 months in the rest
Adjustments
Reduce the latest accident year’s future incremental losses to be proportional
to the level of earned exposure in the first period.
→ Then simulate process variance (or reduce after process var step)
Heteroecthesious Data:
Partial last calendar period data
Adjustments
a) Annualize exposures in the last partial diagonal.
b) Calculate the fitted triangle and residuals.
c) During the ODP bootstrap simulation, calculate and interpolate LDFs
from the fully annualized sample triangles.
d) Adjust the last diagonal of the sample triangles to de-annualize
incrementals on the last diagonal.
e) Project future values by multiplying the interpolated LDFs with the
new cumulative values.
f) Reduce the future incremental values for the latest accident year to
remove future exposure.
Exposure Adjustment
Issue: Exposures changed significantly over the years (e.g. rapidly growing
line or line in runoff)
Adjustment
* If earned exposures exist, divide all claims data by exposures for each
accident year to run the model with pure premiums
* After the process variance step, multiply the result by accident year
exposures to get total claims
Parametric Bootstrapping
Purpose
Parametric bootstrapping is a way to overcome a lack of extreme residuals in
an ODP bootstrap model.
Steps
1) Fit a parameterized distribution to the residuals.
2) Resample residuals from the distribution instead of the observed
residuals.
Purposes of Bootstrap Diagnostics
- Test the assumptions in the model.
- Gauge the quality of the model fit to the data.
- Help guide adjustments of the model parameters to improve the fit of
the model.
Purpose
Find a set of models and parameters that results in the most realistic and
most consistent simulations based on the statistical features of data.
Residual Graphs
Residual graphs help test the assumption that residuals are IID.
Plots to Look at
* Residuals vs Development Period → Look for heteroscedasticity
* Residuals vs Accident Period
* Residuals vs Payment Period
* Residuals vs Predicted
→ Look for issues with trends
→ Plot relative std dev of residuals and range of residuals to further test for
heteroscedasticity
Normality Test
The normality test compares residuals to the normal distribution. If
residuals are close to normal, you should see:
* Normality plot with residuals in line with the diagonal line (normally
distributed)
* High R^2 value and p-value greater than 5%
Note:
In the ODP bootstrap, residuals don’t need to be normally distributed.
AIC and BIC formulas
Identifying Outliers
Identify outliers with a box-whisker plot:
* Box shows 25th 75th percentile
* Whiskers extend to the largest values within 3 times the inter-quartile
range
* Values outside whiskers are outliers
Handling Outliers
- If outliers represent scenarios that can’t be expected to happen again,
then it may make sense to remove them. - Use extreme caution when removing outliers because they may
represent realistic extremes that should be kept in the analysis.
Reviewing Estimated-Unpaid Model Results
- Standard error should increase from the oldest to most recent years
- Standard error for all years should be larger than any individual year
- Coefficients of variation should decrease from the oldest to most recent
years due to independence in incremental payment stream - A reversal in coefficients of variation in recent years could be due to:
- Increasing parameter uncertainty in more recent years
- Model may overestimate uncertainty in recent years, we may want
to switch to BF or Cape Cod model - Minimum/maximum simulations should be reasonable
Methods for combining results of multiple models
Run models with the same random variables:
1) Simulate random variables for each iteration
2) Use same set of random variables for each model
3) Use model weights to weight incremental values from each model for
each iteration by accident year
Run models with independent random variables
1) Run each model separately with different random variables
2) Use weights to randomly select a model for each iteration by accident
year so that the result is a weighted mixture of models
Estimated Cash Flow Results
Simulation of unpaid losses by calendar year have the following
characteristics:
* Standard error of calendar year unpaid decreases as calendar year
increases in future
* Coefficient of variation increases as calendar year increases
→ This is because the final payments projected farthest out will be the
smallest and most uncertain.
Estimated Ultimate Loss Ratio Results
- Estimated ultimate loss ratios by accident year are calculated using all
simulated values, not just the future unpaid values - Represents the complete variability in loss ratio for each accident year
- Loss ratio distributions can be used for projecting pricing risk
Issues with on correlation methods
- Both location mapping and re-sorting methods use residuals of
incremental future losses to correlate segments
→ Both tend to create overall correlations of close to zero - For reserve risk, the correlation that is desired is between total unpaid
amounts for two segments so there may be a disconnect
Correlation between segments:
Re-sorting
Uses algorithms such as a Copula or Iman-Conover to add correlation
Advantages
* Data triangles can be different shapes/sizes by segment
* Can use different correlation assumptions
* Different correlation algorithms may have other beneficial impacts on
the aggregate distribution
* Ex. Can use a copula with a heavy tail distribution to strengthen the
correlation between segments in the tails, which is important for
risk-based capital modeling
Correlation between Segments:
Location Mapping
For each iteration, sample the residuals from the residual triangles using the
same locations for all segments.
Advantages
* Method is easily implemented → Doesn’t require an estimated
correlation matrix and preserves the correlation of the original residuals
Disadvantages
* All segments need to have same size data triangles with no missing data
* Correlation of original residuals is used, so we can’t test other
correlation assumptions