Shapland Flashcards
Property of the over-dispersed Poisson model
the fitted incremental claims will exactly equal the fitted incremental claims derived using the standard chain-ladder factors
Advantage of the ODP bootstrap model
Although sampling with replacement assumes the residuals are independent and identically distributed, it does not require the residuals to be normally distributed.
This allows the distributional form of the residuals flow through the simulation process. (this is sometimes referred to as a ‘semi-parametric’ bootstrap model since we are not parameterizing the residuals
How to include process variance in the future incremental claims
We assume that each future incremental claims follows gamma distribution.
This revised model incorporates process variance and parameter variance in the simulation of the historical and future data
Approach 1 for modeling an unpaid loss distribution using incurred data
We run a paid data model in conjunction with the incurred data model.
Then we use the random payment pattern from each iteration of the paid data model to convert the ultimate values from each corresponding incurred model iteration to develop paid losses by AY.
Advantage: it allows us to use the case reserves to help predict the ultimate losses, while still focusing on the payment stream for measuring risk
An improvement to this approach would be the inclusion of correlation between the paid and incurred models
Approach 2 for modeling an unpaid loss distribution using incurred data
Apply the ODP bootstrap to the Munich chain-ladder (MCL) model. The MCl uses the inherent relationship/correlation between the paid and incurred losses to predict ultimate losses.
When paid losses are low relative to incurred losses, then future paid loss development tends to be higher than average. When paid losses are high relative to incurred losses, then future paid loss development tends to be lower than average.
2 advantages:
1. it does not require us to model paid losses twice.
2. it explicitly measures the correlation between paid and incurred losses
Issue with using the ODP bootstrap
Iterations for the latest few accident years tend to be more variable than what we would expect given the simulations for earlier accident years.
This is due to the fact that MORE age-to-age factors are used to extrapolate the sampled values to develop point estimates for each iteration.
How to fix the issue with the ODP bootstrap
Future incremental values can be extrapolated using the BF or Cape Cod method.
Two drawbacks of GLM bootstrap
- The GLM must be solved for each iteration of the bootstrap model, which may slow down the simulation
- The model is no longer directly explainable to others using age-to-age factors
4 benefits of GLM bootstrap
- Fewer parameters helps avoid over-parametrizing the model
- Gives us the ability to add parameters for calendar year trends.
- Gives us the ability to model data shapes other than triangles
- Allows us to match the model parameters to the statistical features found in the data, and to extrapolate those features
How do we produce point estimates using the GLM bootstrap model
Unlike the ODP bootstrap that replicates the chain-ladder model we do not apply age-to-age factors to each sample triangle to produce point estimates.
Instead, we fit the same GLM model underlying the residuals to each sample triangle. Then we use the resulting parameters to produce ultimates and reserve point estimates.
Drawback: the additional time required to fit a GLM to each sample triangle
3 options to deal with extreme outcomes
- identify the extreme iterations and remove them.
- Recalibrate the model (identify the source of the negative incremental losses and remove it if necessary)
- Limit incremental losses to zero
Should the residuals be adjusted so that their average is zero?
If the average of the residuals is positive, then re-sampling from the residuals will add variability to the resampled incremental losses. I may also cause the resampled incremental losses to have an average greater than the fitted losses. In this case, the residuals should be adjusted.
Using an L-year weighted average for the GLM bootstrap
- We use L years of data by excluding the first few diagonals in the triangle (which leaves us with L+1 included diagonals)
- This changes the shape of the triangle to a trapezoid
- The excluded diagonals are given zero weight in the model and fewer calendar year parameters are required.
- When running the bootstrap simulations, we only need to sample residuals for the trapezoid that was used to parametrize the original model. Because the GLM models incremental claims directly and can be parameterized using a trapezoid. Each parameter set is then used to project the sampled triangles to ultimate.
Using an L-year weighted average for the ODP bootstrap
- We calculate L year average factors instead of all year factors
- We exclude the first few diagonals when calculating residuals
- We still sample residuals for the entire triangle when running bootstrap. Because the ODP bootstrap requires cumulative values in order to calculate link ratios. Once we have cumulative values for each sample triangle, we use L-year average factors to project the sample triangles to ultimate
What does missing values affect
- Loss development factors
- fitted triangle (if the missing value lies on the last diagonal)
- Residuals
- Degree of freedom
Dealing with missing values for ODP bootstrap
- Estimate the missing value using surrounding values
- Exclude the missing value when calculating the loss development factors. No corresponding residual will be calculated for the missing value. Similar to the L-year weighted average, sample for the entire triangle. Once the sample triangles are calculated, we should exclude the cells corresponding to the missing values from the projection process
- If the missing value lies on the last diagonal, we can either estimate the value OR we can use the value in the second to the last diagonal to contract the fitted triangle
Dealing with missing values for GLM bootstrap
The missing data simply reduced the number of observation used in the model.
Similar to ODP, we could use any one of the 3 method to estimate the missing data
Managing outlies for ODP bootstrap
- Exclude the outliers completely (proceed in the same manner as a missing value)
- Exclude the outliers when calculating the age-to-age factors and the residuals (similar to missing values), BUT include the outlier cells during the sample triangle projection process. (remove the extreme impact of the incremental cell by excluding the outlier during the fitting process while still including some non-extreme variability by including the cell in the sample triangle projections)
3 options when excluding outliers to calculate age-to-age factors
- Exclude in the numerator
- Exclude in the denominator
- Exclude in the numerator and denominator
Managing outliers for GLM bootstrap
Outliers are treated similarly to missing data.
If the data is not considered representative of real variability, the outliers should be excluded and the model should be parameterized without it
What do we do if there are a significant number of outliers
- Might indicate that the model is a poor fit to the data
- For GLM, new parameters could be chosen OR the distribution of the error could be changed.
- For ODP, an L-year weighted average could be used to provide a better model fit.
3 options to adjust for heteroscedasticity
- Stratified sampling
- Calculating variance parameters
- Calculating scale parameters
Describe stratified sampling
- Group development periods with homogeneous variances
- Sample with replacement from the residuals in each group separately
Advantage of stratified sampling
It’s straightforward and easy to implement
Disadvantage of stratified sampling
Some groups may only have a few residuals in them, which limits the amount of variability in the possible outcomes
What is heteroecthesious data
incomplete or uneven exposures at interim evaluation dates
Describe partial first development period data
Occurs when the first development column has a different exposure period than the rest of the columns.
This is NOT a problem for parameterizing the ODP bootstrap model since the Pearson residuals use the square root of the fitted value to make them all exposure independent
How to adjust for the partial first development period data
In a deterministic analysis (not bootstrapping), the most recent accident needs to be adjusted to remove exposures beyond the evaluation date. We can reduce the projected future payments by half to remove the exposures from 6/30 to 12/31.
During ODP bootstrap simulation process, we do the same thing. Once the projected future values have been reduced by half, we simulate the process variance as usual.
Alternatively, we can reduce the future values by half AFTER simulating the process variance
Describe the partial last calendar period data
occurs when the latest diagonal only has a 6 months development period
How to adjust for the partial last calendar period data
In a deterministic analysis, we can exclude the latest diagonal when calculating age-to-age factors, interpolate those factors for the exposures in the latest diagonal, and use the interpolated factors to project the future values.
When parameterizing the ODP bootstrap model, we annualize the exposures in the last diagonal to make them consistent with the rest of the triangle. The fitted triangles is calculated based on this annualized triangle to obtain residuals
During the ODP bootstrap simulation process, age-to-age factors are calculated from the annualized sample triangles and interpolated. Then, the latest diagonal in the sample triangle is adjusted back to a six month period. The cumulative values are then multiplied by the interpolated age-to-age factors to project future values. We must reduce the future values for the latest accident year by half
Exposure adjustments under the ODP bootstrap model
We divide the claim data by earned exposure for each AY. this normally improved the fit of the model
The simulation process is then run on the adjusted data.
After the process variance step is completed, we multiply the results by the earned exposures to restate them in terms of total values
Exposure adjustments under the GLM bootstrap model
Similar to the ODP, the GLM model is fit to the exposure adjusted losses.
Main difference: exposure adjusted losses with HIGHER exposures are assumed to have LOWER variance when fitting the GLM.
Exposure adjustments could allow fewer AY parameters for the GLM bootstrap model
Selecting tail factors for ODP bootstrap model
Tail factor can be extrapolated
The tail factor standard deviation is 50% or less of the tail factor -1
Selecting tail factors for GLM bootstrap model
Assume that the final development period will continue to apply incrementally until its effect on the future incremental claims is negligible
Diagnostic tool 1 - Residual graphs
Testing the assumption that residuals are independent and identically distributed
We can graph the residuals by development period, accident period or calendar period or against the fitted incremental losses
Trends in residual graphs
We should be able to draw a relatively flat line through the residuals.
Residuals should appear random
Adjusting Heteroscedasticity in residual graphs
We should group residuals into hetero groups and adjust them to a common standard deviation.
TO help visualize HOW the residuals should be grouped, we can graph relative standard deviations and look for natural groupings
Diagnostic tool 2 - Normality test
Although the ODP model does not require residuals to be normally distributed, it’s still helpful to compare residuals against a normal distribution
This allows us to compare parameter sets and assess the skewness of the residuals.
This test uses both graphs AND calculated test values
Describe normality plots
If the data points tightly distributed around the diagonal line, then the residuals are assumed to be normally distributed
Describe calculated test values for testing normality
- P-value: P-value should be large (greater than 5%). typically based on the Shapiro test for normality
- R^2 - R^2 should be close to 1
- AIC & BIC, these adjust for the number of parameters used in the model. They should be small
How to identify outliers
Use a box-whisker plot.
The values beyond the whiskers (the largest values within 3 times the inter-quartile range) are considered outliers
Describe the principle of parsimony
A model with fewer parameters is preferred as long as the goodness of fit is not markedly different
How to find the optimal mix of parameters in the GLM bootstrap model
- Start with a basic GLM model which includes one parameter for accident, development, and calendar period
- Check the residual plots. If it doesn’t look right, we add more parameters.
The implied development patterns for the GLM should look like a smoothed version of the ODP bootstrap chain-ladder development pattern
When reviewing the estimated unpaid model results
The standard error should increase when moving from the oldest years to the most recent years (because the standard errors follows the magnitude of the results)
The total standard error should be larger than any individual error
The coefficient of variance should generally decrease when moving from the oldest years to the most recent years. (because the older AYs have fewer payments remaining, which causes all of the variability to be reflected in the coefficient)
The total coefficient of variation should be smaller than any individual year’s coefficient of variation
The standard error or coefficient of variance for all years combined will be LESS than the sum of standard error or coefficient of variation for individual years. Because accident years are assumed to be independent
Why the coefficient of variation may rise in the most recent years
- With an increasing number of parameters in the model, parameter uncertainty increases when moving from the oldest years to the most recent years. This parameter uncertainty may overpower the process uncertainty, causing an increase in variability
- The model may simply be overestimating the variability in the most recent years. In this case, the BF or Cape Cod models may need to be used in place of the CL method.
Two methods for combining the results from multiple models
- Run models with the same random variables. Once all the models have been run, the incremental values for each model are weighted together (for each iteration by AY)
- Run models with independent random variables. once all the models have been run, the weights are used to select a model (for each iteration by AY) by randomly sampling the specified percentage of iterations from each model. The result is a weighted mixture of models
How can we use a smoothed results after fitting the distribution
- Access the quality of the fit
- Parameterize a DFA (dynamic financial analysis) model
- Estimate extreme values
- Estimate TVaR
Benefit of using smoothed results
Some of the random noise is prevented from distorting the calculations of specific metrics
Reviewing estimated cash flow results
For AY, standard errors increase and CoV decrease as we move from older to more recent years.
For CY, standard errors decrease and CoV increase as we move from older the more recent years.
How to simulate correlated variables
Using a multivariate distribution whose parameters and correlations have been specified.
However, we don’t know the distribution of each BU
2 correlation process for the Bootstrap model
- Location mapping
- Re-sorting
Describe location mapping
- Pick a BU
- For each iteration, sample a residual and then note where it belonged in the original residual triangle
- each of the segments is then sampled using the residuals at the same locations for their respective residuals triangles.
This preserves the correlation of the original residuals in the sampling process
Pros and Cons of location mapping
Benefit: it can be easily implemented in a spreadsheet and it does not require us to estimate a correlation matrix.
Cons: it requires all of the business segments to come with residual triangles that are the same size and have no missing values for tress testing purposes
Describe re-sorting
To cause correlation among BU in a bootstrap model, the residuals are re-sorted until the rank correlation between each business matches the desired correlation.
P-values can be calculated for each correlation coefficient to test its significances
Benefits of re-sorting
Residual triangles may have different shapes/sizes, different correlation assumptions may be employed AND different correlation algorithms may have beneficial impacts on the aggregate distribution
Cons of re-sorting
need to specify a desired correlation matrix
Advantages of the GLM framework
- Can tailor the model to the statistical features of the data
- Can use fewer parameters to avoid over-parameterization
- Can model data that’s not in a loss triangle
Disadvantages of the GLM Framework
- Simulation is slower because the GLM must be solved for in each iteration
- Can’t directly explain the model using LDFs
Advantages of the ODP bootstrap
- Can use the simpler LDF method and the model will still be based on the GLM framework
- Using LDFs makes the model more easily explainable to others
- The GLM uses a log-link and may not work with negative incremental, but the simplified GLM will still get a solution
Disadvantages of the ODP bootstrap
- Unable to adjust for calendar-year effects
- Requires many parameters and can over-fit the data