Shapland Flashcards
3 advantages of Bootstrap Model
- Generates a distribution of possible outcomes as opposed to a single point estimate.
Provides more info of potential results; can be used for capital modeling - Can be modified to the statistical features of the data under analysis
- Can reflect the fact that insurance loss distributions are generally skewed right. This is because the sampling process does not require a distribution assumption.
Model reflects the level of skewness in the underlying data.
3 reasons for more focus by actuaries on unpaid claims distributions
- SEC is looking for more reserving risk information from publicly traded companies
- Major rating agencies have dynamic risk models for rating and welcome input from company actuaries about reserve distributions
- Companies use dynamic risk models for internal risk management and need unpaid claim distributions
Briefly describe the ODP Model
Incremental claims q(w,d) are modelled directly using a GLM.
Link function: Log
Distribution: ODP
Steps:
1. Use the model to estimate parameters
2. Use bootstrapping (sampling residuals with replacement) to estimate total distribution
Calculate E(q(w,d)) and V(q(w,d)) using ODP GLM Model
E(q(w,d)) = m_w,d
ln(m_w,d) = n_w,d
n_w,d = a_w + sum(b_d)
V(q(w,d)) = phi*m^z_w,d
z = 0 if Normal error distribution
z = 1 if Poisson error distribution
z = 2 if Gamma error distribution
Explain the GLM Model Setup for a 3x3 triangle
See image
Briefly explain for to solve for weight matrix in GLM model
Solve for a and b parameters of the Y = X*A matrix equation that minimizes the squared difference between vector of the log of actual incremental losses (Y) and the log of expected incremental losses (Yhat).
Use Maximum Likelihood or the Newton-Raphson method.
Calculate fitted incremental using GLM Model
ln(E(IncLoss_AY,d)) = ln(m_w,d) = n_w,d) = a_w + sum(b_d)
E(IncLoss_AY,d) = m_w,d = exp(n_w,d)
Explain what is the simplified GLM Method and its steps
Fitted (expected) incremental using a Poisson error distribution are the same as incremental losses using volume-weighted average LDFs.
Steps:
1. Use cumulative claim triangle to calculate LDFs
2. Develop losses to ultimate
3. Calculate expected cumulative triangle
4. Calculate expected incremental triangle from cumulative triangle
3 advantages of the simplified GLM frameword
- GLM can be replaced with simpler link ratio approach while still being grounded in the underlying GLM framework.
- Using age-to-age ratios serves as a “bridge” to the deterministic framework and allows the model to be more easily explained to others.
- We can still use link ratios to get a solution if there are negative incremental, whereas the GLM with a log link might not have a solution.
Calculate Unscaled Pearson residual
r_w,d = q(w,d)-m_w,d / sqrt(m^z_w,d)
Unscaled Pearson residual = (Actual IncLoss - Expected IncLoss)/sqrt(Expected IncLoss^z)
Calculate Scale Parameter
phi = sum(unscaled residuals ^2)/(N-p)
N = # incremental values in triangle
p = #AYs + #LDFs + #hetero groups - 1
State the assumption about residuals necessary for bootstrapped samples
Residuals are independent and identically distributed (iid).
Note: no particular distribution is necessary. Whatever distribution the residuals have will slow into simulated data.
Calculate sampled incremental loss for bootstrap model
q(w,d) = rsqrt(m^z_w,d) + m_w,d
SimIncLoss(AY,d) = SimResidual * sqrt(E(IncLoss)^z) + E(IncLoss)
If m_w,d negative, take absolute value
Calculate Standardized Pearson Residuals
rH_w,d = r_w,d * fH_w,d
fH_w,d = sqrt(1 / (1-H_w,d)) = hat matrix adjustment factors
Explain the steps to create a distribution of point estimates
- Create a sample triangle of incremental losses using sample standardized Pearson residuals, r*, and expected incremental from model m_w,d.
- Calculate cumulative triangle and LDFs for simulated triangle.
- Calculate point estimate of unpaid losses for sampled data.
- Run steps 1-3 for many samples to get a distribution of point estimates.
Note: these steps ignore process variance. We can add process variance to future incremental values using a Gamma distribution.
Explain how to add process variance to future incremental values in bootstrap model
qsim(w,d) follows Gamma (mean = m_w,d, var = phi*m_w,d)
m_w,d is the expected future incremental for this iteration, calculated from sampled bootstrap triangle.
Calculate Standardized Pearson Scale Parameter
phiH = sum(rH_w,d ^2) / N
In the bootstrap model, use unscaled Pearson scale parameter.
Standardized Pearson scale parameter could be used to approximate scale parameter.
Explain the Bootstrapping BF Model
With ODP bootstrap model, iterations for the latest few Ays can result in more variance than expected.
Incorporate BF model by using a priori loss ratios for each AY with standard deviations for each loss ratio and an assumed distribution.
During simulation, for each iteration simulate a new a priori loss ratio.
Explain the Bootstrapping Cape Cod Model
With ODP bootstrap model, iterations for the latest few Ays can result in more variance than expected.
Apply the Cape Cod algorithm to each iteration of the bootstrap model.
3 pros of using fewer parameters in generalizing ODP model
- Helps avoid potentially over-parameterizing the model
- Allows ability to add parameters for calendar-year trends
- Can be used to model data shapes other than data in triangle form (e.g. missing incremental in first few diagonals)
2 cons of using fewer parameters in generalizing ODP model
- GLM must be solved for each iteration of the bootstrap model, slowing simulations.
- Model is no longer directly explainable to others using age-to-age factors
Explain how to correct for negative incremental values using the modified log-link.
When sum of incremental losses in development column in positive, modify the log-link triangle calculations:
ln(q(w,d)) = -ln(abs(q(w,d))) if q(w,d) negative
Explain how to correct for negative incremental values using the negative development periods.
When the sum of the incremental losses in development column is negative.
- Shift incremental losses up by the size of the largest negative increment value (psi)
q+(w,d) = q(w,d) - psi
where psi is negative - Calculate log-link triangle, run the GLM and calculate fitted incremental values m+_w,d
- Shift fitted incremental losses back down by psi
m_w,d = m+_w,d + psi
Explain how to correct for negative incremental values using simplified GLM adjustments
When fitted incrementally (m_w,d) are negative, make the following adjustments to the formulas for residuals and simulated incremental values:
r_w,d = q(w,d) - m_w,d / sqrt(abs(m^z_w,d))
q(w,d) = r * sqrt(abs(m^z_w,d)) + m_w,d
Just switch abs(m_w,d) for m_w,d in the square roots
Explain how to correct for negative values during simulation using process variance
When fitted incremental (m_w,d) is negative, there are 2 options to simulate incremental from Gamma distribution using absolute values:
1. Change sign of simulated value
-Gamma(mean = abs(m_w,d), var = phi*abs(m_w,d))
BUT this results in a skewed left distribution
- Shift the entire distribution to have a mean of m_w,d:
Gamma(mean = abs(m_w,d), var = phiabs(m_w,d)) + 2m_w,d
Explain the problem with negative incremental values during simulation
Negative incremental may cause extreme outcomes for some iterations.
Example:
They may cause cumulative values in an early development column to sum to near zero and the next column to be much larger.
This results in extremely large LDFs and central estimates for an iteration.
3 options to address negative incremental values during simulation
- Remove extreme iterations from results
BUT only remove truly unreasonable iterations - Recalibrate model after identifying the sources of negative incremental (ex: remove a row with sparse data when product was first written)
- Limit incremental losses to zero (replace negative incremental in original data with a zero incremental loss)
Explain how to account for non-zero sum of residuals
Residuals calculated in bootstrap model are just error terms, so they should id with mean of zero.
But, usually the average of all residuals will be non-zero.
Since residuals are random observations, a non-zero sum of all residuals is not necessarily incompatible with true distribution.
To set the average residual to zero, one option is to add a constant to all residuals:
r* = r - rbar
3 solutions to handle missing values
Ex: missing oldest diagonals (if data was lost) or missing values in middle of the triangle.
Calculations affected: LDFs, fitted triangle, residuals, degrees of freedom
Solutions:
1. Estimate missing value from surrounding values
2. Modify LDFs to exclude missing value, no residual for missing value (do not resample from missing values)
3. If missing value on latest diagonal, estimate value or use value in second to last diagonal to get filled triangle, using judgment.
2 ways to handle outliers
There may be outlines that are not representative of variability of the dataset in the future, so we may want to remove them.
- Outliers could be removed and treated as missing values.
- Identity outliers and exclude from LDFs and residual calculations, but resample the corresponding incremental when simulating triangles.
Remove outliers cautiously and only after understanding data since they may represent realistic extremes that should be kept in analysis.
Explain heteroscedasticity and why it is a problem.
When Pearson residuals have different levels of variability at different ages.
OP bootstrap model assumes standardized Pearson residuals are IID. With heteroscedasticity, we cannot take residuals from one development period and use them in other development periods.
2 considerations when assessing heteroscedasticity
- Account for credibility of observed data
- Account for the fact that there are fewer residuals in older development periods.
Explain how to use stratified sampled to adjust for Heteroscedasticity
Group periods together with similar residual variance and only sample residuals from the corresponding group for the model.
- Organize development periods by groups with homogeneous variances
- For each group, sample with replacement only from residuals in that group
BUT: some groups only have a few residuals in them, which limits the amount of variability in possible outcomes
Explain how to use standard deviation to adjust for heteroscedasticity
Group residuals with similar variances. Divide the total residual standard deviation by the standard deviation for each group to get each group’s hetero factor. Multiply residuals by the factor for each group to get all residuals to the same variance level. Then, sample residuals from the entire triangle and back out the adjustment.
- Calculate hetero-adjustment factors by group
hi = stddev(union of all rH_w,d) / std dev(union of rH_w,d in group i) - Adjust residuals in each group
riH_w,d = rH_w,d * hi - Resample residuals and back out hi
qi(w,d) = r/hi * sqrt(m^z_w,d) + m_w,d
Explain how to use scale parameter to adjust for heteroscedasticity
- Calculate overall scale parameter
phi = sum(residuals^2) / N-p - Adjust residuals in each group
phi_i = N/N-p * sum(residuals^2 in group i) / ni
ni is number of residuals in group I - Calculate hetero-adjustment factor hi
hi = sqrt(phi/phi_i)
1 pro and 1 con of adjusting for heteroscedasticity using hetero-adjustment factors
Pro: can resample with replacement from entire triangle
Con: adds parameters, affecting degrees of freedom and scale parameter
Explain what is first dev period heteroecthesious data and how to adjust for it.
This occurs when first development period has a different exposure period length than other columns
Ex: 6 months in the first column and 12 months in the rest
Adjustment:
Reduce latest accident year’s future incremental loses to be proportional to the level of earned exposure in first period
Then simulate process variance (or reduce after process variance step)
Explain what is last period heteroecthesious data is and how to adjust for it.
Partial last calendar period data.
- Annualize exposures in last partial diagonal
- Calculate fitted triangle and residuals
- using ODP bootstrap simulation, calculate and interpolate LDFs from fully annualized sample triangles
- Adjust last diagonal of the sample triangles to de-annualize incremental on the last diagonal
- Project future values by multiplying the interpolated LDFs with the new cumulative values
- Reduce future incremental values for the latest accident year to remove future exposure
Explain how to adjust for exposures changing significantly over the years (ex: rapidly growing line or line in runoff)
If earned exposures exist, divide all claims data by exposures for each accident year to run the model with pure premiums.
After process variance step, multiply the result by accident year exposures to get total claims.
Explain parametric bootstrapping
Purpose: way to overcome a lack of extreme residuals in an ODP bootstrap model
- Fit parametrized distribution to the residuals
- Resample residuals from distribution instead of the observed residuals
Explain the purposes of bootstrap diagnostics
Purpose:
Find a set of models and parameters that results in the most realistic and most consistent simulations based on statistical features of data.
- Test assumptions in model
- Gauge the quality of model fit to data
- Help guide adjustments of the model parameters to improve fit of the model
How can we use graphs to test assumption that residuals are IID
Plots to look at:
1. Residuals vs Development Period (look for heteroscedasticity)
2. Residuals vs Accident Period
3. Residuals vs Payment Period
4. Residuals vs Predicted
Look for issues with trends
Plot relative std dev of residuals and range of residuals to further test for heteroscedasticity
Explain the Normality Test
Compares residuals to the normal distribution. If residuals are close to normal, you should see:
1. Normality plot with residuals in line with diagonal line (normally distributed)
2. High R^2 and p-value greater than 5%
Note: in ODP bootstrap, residuals do not need to be normally distributed
Calculate AIC and BIC to test normality of residuals
AIC = 2p + nln(2pi*RSS/n)+1)
BIC = nln(RSS/n) + pln(n)
Smaller values indicate that residuals fit a normal distribution better.
AIC and BIC add a penalty for more parameters.
Explain how to identify outliers graphically
Use a box-whisker plot:
Box shop 25th - 75th percentile
Whiskers extend to the largest values within 3 times the inter-quartile range
Values outside whiskers are outliers
Explain how to review the estimated-unpaid model results
- Standard error should increase from oldest to most recent years
- Standard error for all years should be larger than any individual year
- Coeff of variation should decrease from oldest to most recent years due to independence in incremental payment stream.
- A reversal in coeff of variation in recent years could be due to:
a) Increasing parameter uncertainty in more recent years
b) Model may overestimate uncertainty in recent years, we may want to switch to BF or Cape Cod model - Min/Max simulations should be reasonable
Explain 2 methods to combine results of multiple models
- Run models with same random variables
a) Simulate random variables for each iteration
b) Use same set of random variables for each model
c) Use model weights to weight incremental values from each model for each iteration by accident year - Run models with independent random variables
a) Run each model separately with different random variables
b) Use weights to randomly select a model for each iteration by accident year so that the result is a weighted mixture of models
2 characteristics of estimated cash flow results
- Std error of calendar year unpaid decreases as calendar year increases in future
- Coeff of variation increases as calendar year increases
This is because the final payments projected farthest out will be the smallest and most uncertain
Explain how to estimate ultimate loss ratio
Estimated ultimate loss ratios by accident year are calculated using all simulated values, not just the future unpaid values.
Represents the complete variability in loss ratio for each accident year.
Loss ratio distributions can be used for projecting pricing risk.
2 issues with correlation methods
Both location mapping and re-sorting methods use residuals of incremental future losses to correlate segments.
Both tend to create overall correlations of close to zero.
For reserve risk, the correlation that is desired is between total unpaid amounts for two segments so there may be a disconnect.
Explain how to account for correlation between segments (2)
- Re-sorting
Use algorithms such as Copula or Iman-Conover to add correlation - Location Mapping
For each iteration, sample residuals from residual triangles using the same locations for all segments
3 advantages to account for correlation between segments using Re-sorting
- Data triangles can be different shapes/sizes by segment
- Can use different correlation assumptions
- Different correlation algorithms may have other beneficial impacts on aggregate distribution
Ex: can use a copula with heavy tail distribution to strengthen the correlation between segments in tails, which is important for risk-based capital modeling.
2 advantages to account for correlation between segments using location mapping
Method is easily implemented
Does not require an estimated correlation matrix
Preserves the correlation of original residuals
2 disadvantages to account for correlation between segments using location mapping
- All segments need to have same size data triangles with no missing data
- Correlation of original residuals is used, so we cannot test other correlation assumptions
List potential data issues associated with applying ODP bootstrap model
- Negative incremental values
- Non-zero sum of residuals
- Using n-year weighted average
- Missing values
- Outliers
- Heteroscedasticity
- Heteroecthesious data
- Exposures changing over time
- Lack of extreme residuals
Briefly describe 4 diagnostics tests to evaluate GLM model for reasonableness
- Standard error should increase from older to more recent Ays, because std error should follow size of increasing unpaid loss reserve.
- Total, all-year std error should be larger than the standard error for any individual year.
- CV should decrease moving from older to more recent Ays, with possible exception of the most recent year (where parameter uncertainty might overpower process uncertainty). This is because older years have a smaller loss reserve and there are few claims payments remaining.
- CV of total reserve should be less than any individual year because the model assumes accident years are independent.
Describe 2 practical limitations of a log-link GLM bootstrap modeling framework.
- It cannot handle negative incremental values
- GLM bootstrap must be solved at each iteration making it time consuming.