Meyers Flashcards
When models do not accurately predict dist of outcomes for test data, 3 explanations
- Insurance process is too dynamic to be captured by single model
- Could be other models that better fit data
- Data used to calibrate model is missing crucial info needed to a make reliable prediction
3 tests to validate models
- histogram
- p-p plot
- K-S statistic
histogram
- if percentiles are uniformly distributed, height of bars should be equal
- for small sample, not perfectly level
- if level, model is appropriate
p-p plot
- tests for stat significance of uniformity
- plot expected percentiles on x and sorted predicted percentiles on y -> if predicted percentiles are uniformly dist, plot lies along 45 degree line
ie model is appropriate if p-p plot lies along 45 degree line
expected value e = {1/(n+1),…,n/(n+1)}
K-S statistic
D=max|pi-fi|
fi = 100*{1/n,…,n/n}
- can reject hypothesis that set of percentiles is uniform @ 5% level if D > critical value = 136/sqrt(n)
- critical values appear as 45 degree bands that run parallel to y=x
- Meyers deems model validated if passes K-S test
Validating Mack: results
- incurred data
- on histogram, percentiles show little uniformity and actual outcomes are falling into smaller and larger percentiles more often -> Mack produces dist that is light tailed
- in p-p plot, predicted percentiles form S shape -> light tailed because actual outcomes failing into percentiles that are lower than expected in left tail and higher in right tail
- D > critical value
Validating ODPB: results
- paid data
- predicted outcomes are occurring in lower percentiles more often -> implies both models produce expected loss estimates that are biased high when modeling paid losses
- producing higher expected loss estimates, left tail becomes lighter
- D > critical value
Possible reasons for observations for paid and incd data ie Mack and ODPB results
- insurance loss environment has experience changes that are not yet observable
- other models that can be validated
Bayesian models for Incurred loss data
-Mack model underestimates variability of predictive distribution which leads to light tails
Leveled Chain Ladder (LCL)
Correlated Chain Ladder (CCL)
Leveled Chain Ladder (LCL)
- treats level of AY as random ie independence between AY -> model will predict more risk
- sigma is larger for earlier DPs where more claims open and more variability
Correlated Chain Ladder (CCL)
- allows for correlation between AYs -> model will predict more risk than LCL
- should result in larger standard deviation for predicted distribution (heavier in tails), which would result in percentiles of outcomes to be more uniform than LCL
LCL results
- produce higher std dev than Mack
- has S shape and some points lie outside K-S bounds, but improvement over Mack, & D is closer to critical value
CCL results
- produce higher std dev than Mack
- CCL produced higher std dev for each AY than LCL
- CCL has S shape and all points within bounds and D is smaller than critical value -> model validates against data and exhibits uniformity
Bayesian models for Paid loss data
-CCL model produced estimates that were biased high
Correlated Incremental Trend (CIT)
Leveled Incremental Trend (LIT)
Correlated Incremental Trend (CIT)
- introduces payment trend and dist is skewed right and allows for negative values (model should be based on increm paid)
- sigma is smaller for earlier DPs
- opposite from LCL b/c increm loss
Leveled Incremental Trend (LIT)
-similar to CIT but does not have AY correlation
results for CIT and LIT
- CIT and LIT produce estimates that are biased high
- neither show noticeable improvement over ODP and Mack
Changing Settlement Rate (CSR)
- claims are reported and settled faster due to tech and CIT model might not fully reflect this change
- allows for changing settlement rates which can reflect speedup in claim settlement for more recent AY
- cum. paid losses since no longer considering payment yr trend
CSR results
- histogram is nearly level
- p-p plot closely tracks with y=x
- indicates that incurred data recognized speedup in claims settlement rate which led to good fit with CCL
total risk
total risk = process risk + parameter risk
process risk vs parameter risk
- process risk = avg var of outcomes from expected result
- parameter risk = var due to many possible parameters in posterior dist of parameter
Meyers found what risk is close to total risk for several insurers
parameter risk
model risk
- model risk = risk that we did not select right model
- model risk is special case of parameter risk
if p-p plot shows S curve
demonstrates more high and low percentiles than expected
45 degree = uniformly distributed
CCL: distribution for ultimate loss
start with loss given C(w,d) and then calc u(w,d) using below
u(1,d)=alpha(1)+beta(d)
calculate parameters of distribution for ~C by correlating with AY given:
~u(w,d)=alpha(w)+beta(d)+row*(ln(C(w-1,d))-u(w-1,d))
C(w-1,d) and u(w-1,d) is from 1 steps
~C(w,d) is simulated from lognormal with log mean ~u(w,d) and log std dev σ(d)
CSR: distribution for ultimate losses
u(w,d)=alpha(w)+beta(d)*(1-gamma)^(w-1)
C(w,d) is simulated from lognormal dist with log mean=u(w,d) and log std dev=σ(d)
CSR: how gamma parameter impacts claims payment pattern
development period portion of logmean formula is Beta(d)*(1-gamma)^(w-1)
with later AY, absolute value will be smaller
the larger this portion is, the larger logmean, resulting in higher simulated losses
If Beta(d) is negative, then logmean larger
higher simulated losses indicate speedup in settlement rate for more recent AY
LCL compared to MACK
-compared to Mack, model uses random level parameters instead of fixed level parameters for most recent cumulative loss for the AY
CCL compared to MACK
- model uses random level parameters instead of fixed level parameters for most recent cumulative loss for the AY
- model incorporates correlation between AY
procedure used by CCL to create loss distribution for ultimate losses
- use loss triangle and prior distributions for CCL model, run MCMC script to estimate posterior distributions
- create sample sets of parameters from posterior distributions
- for each parameter set, simulate the ultimate losses, iterated for each AY
- simulated losses C(w-1,10) is used to calc u(w) for next AY; simulated C(w,10) is then simulate with u(w) and sigma(10) - distribution and any summary statistics are calculated off of total ultimate losses across AY
incorporating expert knowledge about expected losses in CCL
prior distribution for level parameters and logelr parameter can be specified to be more restrictive instead of using vague priors so that model better reflects expected losses