Meyers Flashcards
Why do models fail to accurately predict test data?
- The insurance process is too dynamic to be captured in a single model
- There could be other models that better fit the data
- The data used to calibrate the model is missing crucial information needed to make a reliable prediction
Test 1: Histogram
If the percentiles are uniformly distributed, the height of the bars should be equal
A symmetric histogram implies that the expected value is accurate
Test 2: 𝑝−𝑝 Plot & Kolmogorov-Smirnov (K-S) Test
Tests the statistical significance of uniformity
K-S Test
n = number of predicted percentiles
K-S statistic D = max(ABS(p_i - f_i))
where {p_i} is the set of predicted percentiles and {f_i} = 100 * {1/n,2/n,…,n/n}
Reject the hypothesis of uniformity at the 5% level if D > 136 / SQRT(n)
Model is validated if it passes the K-S test
p-p Plot
- Sort a sample of n predicted percentiles into increasing order
- Plot the expected percentiles e_i = 100 * {1/(n+1), 2/(n+1),…,n/(n+1)} on the x axis and the sorted predicted percentiles on the y axis
- If these predicted percentiles are uniformly distributed, we expect this plot to lie along a 45-degree line
Reject the hypothesis of uniformity if the p-p plot lies outside the 45 degree bands parallel to the line y =x
Light-tailed distribution
Actual outcomes fall into the smaller and larger percentiles of the distributions
Forms an S shape on the p-p plot - actual outcomes are falling into percentiles that are lower than expected in the left tail and higher than expected in the right tail
Underestimates the variability of ultimate loss estimates
Confidence intervals will be too small
Heavy-tailed distribution
Actual outcomes fall into the middle percentiles of the distributions
Forms a backwards S shape on the p-p plot - actual outcomes are falling into percentiles that are higher than expected in the left tail and lower than expected in the right tail
Confidence intervals will be too wide
Validating the Mack Model - Incurred Losses
Meyers used the Mack model to calculate the mean and SD and fit a lognormal distribution with those parameters - looked at the actual outcome as a percentile of that distribution
Histogram: light-tailed
p-p Plot: light-tailed
K-S Statistic: reject hypothesis of uniformity
Validating the Bootstrap ODP Model - Paid Losses
Actual outcomes occur in the lower percentiles of the model distributions more often
Produces expected loss estimates that are biased high - more of the actual outcomes will fall in lower percentiles because the model distributions are shifted too far to the right
Expected loss estimates are too high
K-S Statistic: reject hypothesis of uniformity
Validating the Mack Model - Paid Losses
Actual outcomes occur in the lower percentiles of the model distributions more often
Produces expected loss estimates that are biased high - more of the actual outcomes will fall in lower percentiles because the model distributions are shifted too far to the right
Expected loss estimates are too high
K-S Statistic: reject hypothesis of uniformity
Bayesian Models for Incurred Loss Data
Increases the variability of the predictive distribution and extends the tails
- Treats the level of the AY as random to predict more risk - in contrast to Mack, where the observed losses act as fixed level parameters
- Allows for correlation between AYs - in contrast to Mack, where AYs are independent
Leveled Chain-Ladder (LCL) Model
The level of each AY is defined as mu_w,d = alpha_w + beta_d
The simulated cumulative loss C_w,d has a lognormal distribution with log mean mu_w,d and log SD sigma_d, subject to the constraint that sigma_1 > sigma_2 > … > sigma_10
SD is larger for earlier development periods where there are more claims open and more variability
Each parameter is given a wide prior distribution so that the posterior distributions will be highly influenced by the data during the Bayesian MCMC process
Correlated Chain-Ladder (CCL) Model
Allows for correlation between each subsequent mu parameter
The level of each AY is defined as mu_w,d = alpha_w + beta_d + p * (LN(C_w-1,d) - mu_w-1,d)
The correlation parameter p is given a wide prior distribution and when p = 0, the CCL model reduces to the LCL model
- For each parameter set, start with the given C_1,10 and calculate the mean mu_2,10
- Simulate C_2,10 from a lognormal distribution with log mean mu_2,10 and log SD sigma_10
- Use the result of this simulation to simulate the next ultimate loss
- Do this process many times to form a predictive distribution for each AY and in total
Validating the LCL Model - Incurred Losses
Histogram: light-tailed
p-p Plot: light-tailed
K-S Statistic: reject hypothesis of uniformity
Better results than Mack with higher SD since additional variability was introduced
Validating the CCL Model - Incurred Losses
Histogram: light-tailed
p-p Plot: light-tailed, but all points lie inside the bounds
K-S Statistic: uniformity
Better results than Mack with higher SD since additional variability was introduced (also higher than LCL)
Validating the CCL Model - Paid Losses
Produces expected loss estimates that are biased high - more of the actual outcomes will fall in lower percentiles because the model distributions are shifted too far to the right
CY Trend in Paid Losses
- The model should be based on incremental paid loss amounts since cumulative losses include settled claims which do not change with time
- Incremental paid loss amounts tend to be skewed to the right and can be negative, need a loss distribution that allows for these features
Skew Normal Distribution Form 1
Location parameter mu
Scale parameter omega
Shape parameter delta
X ~ mu + omega * delta * Z + omega * SQRT(1-delta^2) * e
where Z is the truncated normal distribution that only takes on positive values and e is the normal distribution
Delta = 0 -> normal distribution
As delta approaches 1, the distribution becomes more skewed
Delta = 1 -> truncated normal distribution
This form caps the coefficient of skewness to that of the truncated normal distribution
Skew Normal Distribution Form 2
Replaces the truncated normal distribution with the lognormal distribution
Correlated Incremental Trend (CIT) Model
mu_w,d = alpha_w + beta_d + tau * (w + d -1)
Z_w,d ~ lognormal(mu_w,d, sigma_d), subject to the constraint that sigma_1 < sigma_2 < … < sigma_10
I_w,d ~ normal(Z_w,d + p * (I_w-1,d - Z_w-1,d) * EXP(tau), delta)
For w = 1 -> I_1,d ~ normal(Z_1,d, delta)
Distribution is skewed, allows for negative values, and has payment trend tau
Each parameter is given a wide prior distribution so that the posterior distributions will be highly influenced by the data during the Bayesian MCMC process, EXCEPT for tau and sigma which were given more restrictive prior distributions
Comparing the CIT & CCL Models
- Since CCL model is applied to cumulative losses, sigma_d decreases as d increases since a greater proportion of claims are settled (less variability)
- Since CIT model is applied to incremental losses, sigma_d increases as d increase since smaller, less volatile claims tend to be settled earlier
- Since there is a possibility of negative incremental losses, the correlation feature is applied to the log of the cumulative losses in the CCL model (outside of the log)
Leveled Incremental Trend (LIT) Model
CIT model without AY correlation
Validating the CIT Model - Paid Losses
Produces estimates that are biased high
No improvement over Mack or ODP models
Validating the LIT Model - Paid Losses
Produces estimates that are biased high
No improvement over Mack or ODP models
Changing Settlement (CSR) Model
Reflects the speedup in claim settlement due to technology
Uses cumulative paid losses due to no longer considering a payment trend
No correlation or trend terms
mu_w,d = alpha_w + beta_d * (1 - y)^(w-1)
C_w,d has a lognormal distribution with log mean mu_w,d and log SD sigma_d, subject to the constraitn sigma_1 > sigma_2 > … > sigma_10
y > 0 indicates a speedup in claim settlement
Validating the CSR Model - Paid Losses
Histogram, p-p Plot, and K-S Statistic indicate uniformity
Suggest that the incurred data recognized the speed-up in claims settlement rate for the CCL model
Process Risk
Represents the average variance of the outcomes from the expected result
E[Var(X|theta)]
Parameter Risk
Represents the variance due to many possible parameters in the posterior distribution of the parameter
Var[E(X|theta)]
Total Risk
Total Risk = Process Risk + Parameter Risk
Model Risk
The risk that we did not select the right model
Shows up in the process risk portion of the total risk
- Formulate a model that is a weighted average of the various candidate models, where the weights are the parameters
- If the posterior distribution of the weights assigned to each model has significant variability, then model risk exists
Incurred Data Models
- Mack model understates variability
- CCL model allows for AY correlation and predicts the distribution of outcomes correctly within a specified confidence level
Paid Data Models
- Bootstrap ODP, Mack, and CCL models give estimates of the expected ultimate loss that are biased high, suggesting there is a change in the loss environment that is not being captured in the models
- CIT and LIT introduce CY trends but fail to improve
- CSR model introduces a parameter to account for speedup in claims settlement rates and predicts the distribution of outcomes correctly within a specified confidence level