Shapland Flashcards
ODP model
- uses GLM to model incremental q(w,d) claims
- uses log link fct and ODP for error fct
fitted incremental claims using ODP
= fitted incremental claims derived using CL factors
- start with latest diag and divide backwards successively by each DF, obtain fitted cumulative claims then using subtraction get fitted incremental claims
- model is known as ODPB
ODPB benefit
-simple link ratio algorithm can be used in place of more complicated GLM while maintaining underlying GLM framework
robust GLM: expected incremental formula
Bootstrap process
- calc fitted cumulative losses using actual DFs then calc fitted incremental losses
- calc actual increm loss
- calc residuals
- Pearson residuals used since they are calculated consistently with scale parameter ϕ
- sampling with replacement from residuals of data
- sampling can be used to create new sample triangles of increm claims
- sample triangles can be cumulated and DFs can be calculated and applied to calc point estimates for data
- have distribution of point estimates which incorporates process var and parameter var in simulation of hist data
sampling with replacement assumes
residuals are independent and identically distributed, does not require them to be normally distributed
sampling can be used to create new sample triangles of increm claims -> formula for incremental loss q*
Adjustments to unscaled Pearson residuals
- DoF adj factor
- hat matrix adjustment factor
DoF adj factor
-DoF adj factor is used to correct for bias in residuals up front aka add more dispersion aka more var. -> scaled Pearson residuals
N: # data cells in triangle p: parameters = 2*AYs -1
hat matrix adjustment factor
-hat matrix adjustment factor is considered replacement for and improvement over degrees of freedom factor
Only use diagonal
Standardized residuals ensure
that each residual has same variance
Negative incremental values if sum of column is positive
Ln(q) for q>0
0 for q=0
-Ln(|q|) for q<0
Negative incremental values if column in negative
q+=q-psi
m=m+ + psi
-psi is largest neg in value in triangle (largest ind or sum)
Heteroscedasticity
- model errors do not share common variance
- violates assumption that residuals are i.i.d.
heteroscedasticity: 3 options
stratified sampling, variance parameters, scale parameters
Stratified sampling
group development periods with homogeneous variances/simiilar residual variances
for each simulated incremental loss, only sample residuals from the same age (same group?)-> some groups may lack credibility
Calc variance parameters
group, calc std dev of residuals in each of hetero groups, and calc hetero-adj factor for each group -> STANDARDIZED residuals rH
*this gives residuals constant variance
- sample with replacement among all residuals and divide each residual by adj factor when residuals are resampled
**goes from group3 to group2, divide by group2 hi for qi*
Calc scale parameters:
similar but hetero-adj factor is based on scale parameter -> have to look @ unscaled PEARSON residuals r
**use same formula for riH and qi*
modify phi so that each hetero group has a different scale parameter when adding future process variance and use hetero-factor to adjust simulated losses similar to variance parameters
residual plots
- Tests the assumption residuals are i.i.d.
- DP, AY, or CY
- do residuals exhibit any trends?
- should have random pattern
- do residuals have different variances? -> heteroscedasticity
- if so should group into hetero group and adjust them to common std deviation
Standard errors
- should increase over time (oldest to youngest years)
- total reserve std error should be larger than any ind year
CoV
- should decrease over time (newest AY could have large because parameter uncertainty could overpower process uncertainty)
- total reserve CoV should be less than any ind year
Normality test
- allows comparison of parameter sets and assess skewness
- normality plot: data points should be tightly distributed around diagonal line
- calc test values, results for normal distributed
p-value > 5%
R^2 close to 1
Parsimony
model with fewer parameters is preferred as long as goodness of fit is not markedly different
options for using multiple models
- run models with same RVs
- run models with independent RVs
If a ton of outliers?
- model is poor fit
- GLMB: new parameters chosen or error dist changed
- ODPB: L-year weighted avg
Heteroecthesious data
- ODPB requires symmetrical shape and homoecthesious data (similar exposures)
- can relax symmetrical shape by using L-yr wght avg or excl diags
- common Heteroecthesious data triangles
- partial first development period data
- partial last calendar period data
matrix notation and how to solve for parameters
model specification:
Y=X*A
Y=[ln(q(1,1);ln(q(2,1);…..]
A=[alpha1; alpha2; …; beta2; ….; gamma2; 2*gamma2; ….]
X= 0 and 1 matrix
*alpha is AY parameter
model with fitted parameters and soln matrix W:
W=X*A
W=[ln(m(1,1);ln(m(2,1);…..]
*solve for parameters in A that minimize squared difference between vectors Y (actual log loss) and W (expected log loss)
*use iteratively weighted least squares to solve for A
how can CY trend be incorporated in matrix/model
include CY parameter, gammak in model specification of 0 for diagonal, 1 for second diagonal and so on
*fitted parameter will be CY trend
benefit and disadvantage to using single AY parameter ie alpha
advantage = using fewer parameters helps avoid overfitting and can help improve the model performance for future losses
disadvantage = if there are signifcatnt differences in level by AY, model cant directly reflect those differences and may fit data worse
Standardized Pearson residuals and scale parameter process
cumulative losses
back out LDFs
fit incremental losses
calc unscaled pearson residuals: r=(q-m)/sqrt(m^z)
calc scale parameter: phi = sum(r^2)/(N-p) where N is number of data points
calc hat matrix triangle
calc fH triangle -> sqrt(1/(1-H))
standardized pearson residuals: rH=r*fH
how process variance can be incorporated in simulation of incremental losses
projected cumulative loss = use latest diag from simulated and use CDFs from simulation
calc projectected incremental loss
simulate future incrementals from Gamma distribution to add process variance
q(w,d) ~ gamma(mean=m(w,d), var= phi*m(w,d))
process of adjusting for negative column sum
calc adjusted incremental losses: q+=q-psi
calc fitted adjusted losses with simplified GLM -> calc LDFs from adjusted cumulative losses and divide backwards to get fitted adjusted cumulative losses
calc fitted adj incremental losses: m+
calc fitted incrementals: m=m+ + psi
calc residuals as normal
multiple models: same random variables
calc unpaid loss for each iteration
incremental loss = sum(weight(model) * inc loss(model))
multiple models: weighted mixture of models AKA independent RVs
calc unpaid loss for each iteration
random values from [0,1] for each accident year and iteration to mix models
*make table of selected models for each AY and iteration
if random value < model weight #1, pick model 1
if random value > model weight #1, pick model 2
model weights will be given by AY
*make table of best estimate of unpaid losses
problem with differing levels of variability of residuals at different development periods impacts appropriateness of model
if earlier periods have greater standardized Pearson residual variability than later periods (heteroscedasticity), then during simulation residuals could be sampled from later periods to simulate losses at earlier development
when this happens, there will be less variability in results than there should be for earlier periods and model is not appropriate
if they ask you to calculate point estimate of projected unpaid loss before process variance is added
use n=ln[m]=alpha +Σbeta
to get estimates of projected incrementals ie the points outside of triangle
then add these together to get unpaid estimate
these point estimates are also mprojected(w,d)
if incorporating process variance: qsim(w,d)=gamma(mprojected, phi*mprojected)
AKA simulated future incrementals
if they ask you to estimate expected incremental loss
use n=ln[m]=alpha +Σbeta
to calculate incremental loss triangle m(w,d)
simulated incremental losses
q*(w,d)=r*sqrt(mz)+m
where r* is simulated residuals
2 advantages and disadvantages of using GLM framework compared to simplified GLM
ADV: GLM can use fewer paramers than simplified to avoid overparameterizing
GLM is more flexible and we can add different parameters such as CY trend parameters
DISADV: GLM framework requires us to re-fit GLM for each iteration of sampled data -> can slow down process
model isn’t directly explainable to others using LDFs like simplified GLM is