Shapland Flashcards

1
Q

3 advantages of Bootstrap Model

A
  1. Generates a distribution of possible outcomes as opposed to a single point estimate.
    Provides more info of potential results; can be used for capital modeling
  2. Can be modified to the statistical features of the data under analysis
  3. Can reflect the fact that insurance loss distributions are generally skewed right. This is because the sampling process does not require a distribution assumption.
    Model reflects the level of skewness in the underlying data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 reasons for more focus by actuaries on unpaid claims distributions

A
  1. SEC is looking for more reserving risk information from publicly traded companies
  2. Major rating agencies have dynamic risk models for rating and welcome input from company actuaries about reserve distributions
  3. Companies use dynamic risk models for internal risk management and need unpaid claim distributions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Briefly describe the ODP Model

A

Incremental claims q(w,d) are modelled directly using a GLM.

Link function: Log
Distribution: ODP

Steps:
1. Use the model to estimate parameters
2. Use bootstrapping (sampling residuals with replacement) to estimate total distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculate E(q(w,d)) and V(q(w,d)) using ODP GLM Model

A

E(q(w,d)) = m_w,d
ln(m_w,d) = n_w,d
n_w,d = a_w + sum(b_d)
V(q(w,d)) = phi*m^z_w,d
z = 0 if Normal error distribution
z = 1 if Poisson error distribution
z = 2 if Gamma error distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the GLM Model Setup for a 3x3 triangle

A

See image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Briefly explain for to solve for weight matrix in GLM model

A

Solve for a and b parameters of the Y = X*A matrix equation that minimizes the squared difference between vector of the log of actual incremental losses (Y) and the log of expected incremental losses (Yhat).

Use Maximum Likelihood or the Newton-Raphson method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculate fitted incremental using GLM Model

A

ln(E(IncLoss_AY,d)) = ln(m_w,d) = n_w,d) = a_w + sum(b_d)
E(IncLoss_AY,d) = m_w,d = exp(n_w,d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what is the simplified GLM Method and its steps

A

Fitted (expected) incremental using a Poisson error distribution are the same as incremental losses using volume-weighted average LDFs.

Steps:
1. Use cumulative claim triangle to calculate LDFs
2. Develop losses to ultimate
3. Calculate expected cumulative triangle
4. Calculate expected incremental triangle from cumulative triangle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 advantages of the simplified GLM frameword

A
  1. GLM can be replaced with simpler link ratio approach while still being grounded in the underlying GLM framework.
  2. Using age-to-age ratios serves as a “bridge” to the deterministic framework and allows the model to be more easily explained to others.
  3. We can still use link ratios to get a solution if there are negative incremental, whereas the GLM with a log link might not have a solution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Calculate Unscaled Pearson residual

A

r_w,d = q(w,d)-m_w,d / sqrt(m^z_w,d)
Unscaled Pearson residual = (Actual IncLoss - Expected IncLoss)/sqrt(Expected IncLoss^z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate Scale Parameter

A

phi = sum(unscaled residuals ^2)/(N-p)
N = # incremental values in triangle
p = #AYs + #LDFs + #hetero groups - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

State the assumption about residuals necessary for bootstrapped samples

A

Residuals are independent and identically distributed (iid).

Note: no particular distribution is necessary. Whatever distribution the residuals have will slow into simulated data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate sampled incremental loss for bootstrap model

A

q(w,d) = rsqrt(m^z_w,d) + m_w,d
SimIncLoss(AY,d) = SimResidual * sqrt(E(IncLoss)^z) + E(IncLoss)

If m_w,d negative, take absolute value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculate Standardized Pearson Residuals

A

rH_w,d = r_w,d * fH_w,d

fH_w,d = sqrt(1 / (1-H_w,d)) = hat matrix adjustment factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the steps to create a distribution of point estimates

A
  1. Create a sample triangle of incremental losses using sample standardized Pearson residuals, r*, and expected incremental from model m_w,d.
  2. Calculate cumulative triangle and LDFs for simulated triangle.
  3. Calculate point estimate of unpaid losses for sampled data.
  4. Run steps 1-3 for many samples to get a distribution of point estimates.

Note: these steps ignore process variance. We can add process variance to future incremental values using a Gamma distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain how to add process variance to future incremental values in bootstrap model

A

qsim(w,d) follows Gamma (mean = m_w,d, var = phi*m_w,d)
m_w,d is the expected future incremental for this iteration, calculated from sampled bootstrap triangle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Calculate Standardized Pearson Scale Parameter

A

phiH = sum(rH_w,d ^2) / N

In the bootstrap model, use unscaled Pearson scale parameter.
Standardized Pearson scale parameter could be used to approximate scale parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain the Bootstrapping BF Model

A

With ODP bootstrap model, iterations for the latest few Ays can result in more variance than expected.

Incorporate BF model by using a priori loss ratios for each AY with standard deviations for each loss ratio and an assumed distribution.

During simulation, for each iteration simulate a new a priori loss ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Explain the Bootstrapping Cape Cod Model

A

With ODP bootstrap model, iterations for the latest few Ays can result in more variance than expected.

Apply the Cape Cod algorithm to each iteration of the bootstrap model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

3 pros of using fewer parameters in generalizing ODP model

A
  1. Helps avoid potentially over-parameterizing the model
  2. Allows ability to add parameters for calendar-year trends
  3. Can be used to model data shapes other than data in triangle form (e.g. missing incremental in first few diagonals)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

2 cons of using fewer parameters in generalizing ODP model

A
  1. GLM must be solved for each iteration of the bootstrap model, slowing simulations.
  2. Model is no longer directly explainable to others using age-to-age factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain how to correct for negative incremental values using the modified log-link.

A

When sum of incremental losses in development column in positive, modify the log-link triangle calculations:
ln(q(w,d)) = -ln(abs(q(w,d))) if q(w,d) negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain how to correct for negative incremental values using the negative development periods.

A

When the sum of the incremental losses in development column is negative.

  1. Shift incremental losses up by the size of the largest negative increment value (psi)
    q+(w,d) = q(w,d) - psi
    where psi is negative
  2. Calculate log-link triangle, run the GLM and calculate fitted incremental values m+_w,d
  3. Shift fitted incremental losses back down by psi
    m_w,d = m+_w,d + psi
24
Q

Explain how to correct for negative incremental values using simplified GLM adjustments

A

When fitted incrementally (m_w,d) are negative, make the following adjustments to the formulas for residuals and simulated incremental values:
r_w,d = q(w,d) - m_w,d / sqrt(abs(m^z_w,d))
q(w,d) = r * sqrt(abs(m^z_w,d)) + m_w,d

Just switch abs(m_w,d) for m_w,d in the square roots

25
Q

Explain how to correct for negative values during simulation using process variance

A

When fitted incremental (m_w,d) is negative, there are 2 options to simulate incremental from Gamma distribution using absolute values:
1. Change sign of simulated value
-Gamma(mean = abs(m_w,d), var = phi*abs(m_w,d))
BUT this results in a skewed left distribution

  1. Shift the entire distribution to have a mean of m_w,d:
    Gamma(mean = abs(m_w,d), var = phiabs(m_w,d)) + 2m_w,d
26
Q

Explain the problem with negative incremental values during simulation

A

Negative incremental may cause extreme outcomes for some iterations.

Example:
They may cause cumulative values in an early development column to sum to near zero and the next column to be much larger.
This results in extremely large LDFs and central estimates for an iteration.

27
Q

3 options to address negative incremental values during simulation

A
  1. Remove extreme iterations from results
    BUT only remove truly unreasonable iterations
  2. Recalibrate model after identifying the sources of negative incremental (ex: remove a row with sparse data when product was first written)
  3. Limit incremental losses to zero (replace negative incremental in original data with a zero incremental loss)
28
Q

Explain how to account for non-zero sum of residuals

A

Residuals calculated in bootstrap model are just error terms, so they should id with mean of zero.

But, usually the average of all residuals will be non-zero.

Since residuals are random observations, a non-zero sum of all residuals is not necessarily incompatible with true distribution.

To set the average residual to zero, one option is to add a constant to all residuals:
r* = r - rbar

29
Q

3 solutions to handle missing values

A

Ex: missing oldest diagonals (if data was lost) or missing values in middle of the triangle.

Calculations affected: LDFs, fitted triangle, residuals, degrees of freedom

Solutions:
1. Estimate missing value from surrounding values
2. Modify LDFs to exclude missing value, no residual for missing value (do not resample from missing values)
3. If missing value on latest diagonal, estimate value or use value in second to last diagonal to get filled triangle, using judgment.

30
Q

2 ways to handle outliers

A

There may be outlines that are not representative of variability of the dataset in the future, so we may want to remove them.

  1. Outliers could be removed and treated as missing values.
  2. Identity outliers and exclude from LDFs and residual calculations, but resample the corresponding incremental when simulating triangles.

Remove outliers cautiously and only after understanding data since they may represent realistic extremes that should be kept in analysis.

31
Q

Explain heteroscedasticity and why it is a problem.

A

When Pearson residuals have different levels of variability at different ages.

OP bootstrap model assumes standardized Pearson residuals are IID. With heteroscedasticity, we cannot take residuals from one development period and use them in other development periods.

32
Q

2 considerations when assessing heteroscedasticity

A
  1. Account for credibility of observed data
  2. Account for the fact that there are fewer residuals in older development periods.
33
Q

Explain how to use stratified sampled to adjust for Heteroscedasticity

A

Group periods together with similar residual variance and only sample residuals from the corresponding group for the model.

  1. Organize development periods by groups with homogeneous variances
  2. For each group, sample with replacement only from residuals in that group

BUT: some groups only have a few residuals in them, which limits the amount of variability in possible outcomes

34
Q

Explain how to use standard deviation to adjust for heteroscedasticity

A

Group residuals with similar variances. Divide the total residual standard deviation by the standard deviation for each group to get each group’s hetero factor. Multiply residuals by the factor for each group to get all residuals to the same variance level. Then, sample residuals from the entire triangle and back out the adjustment.

  1. Calculate hetero-adjustment factors by group
    hi = stddev(union of all rH_w,d) / std dev(union of rH_w,d in group i)
  2. Adjust residuals in each group
    riH_w,d = rH_w,d * hi
  3. Resample residuals and back out hi
    qi(w,d) = r/hi * sqrt(m^z_w,d) + m_w,d
35
Q

Explain how to use scale parameter to adjust for heteroscedasticity

A
  1. Calculate overall scale parameter
    phi = sum(residuals^2) / N-p
  2. Adjust residuals in each group
    phi_i = N/N-p * sum(residuals^2 in group i) / ni
    ni is number of residuals in group I
  3. Calculate hetero-adjustment factor hi
    hi = sqrt(phi/phi_i)
36
Q

1 pro and 1 con of adjusting for heteroscedasticity using hetero-adjustment factors

A

Pro: can resample with replacement from entire triangle

Con: adds parameters, affecting degrees of freedom and scale parameter

37
Q

Explain what is first dev period heteroecthesious data and how to adjust for it.

A

This occurs when first development period has a different exposure period length than other columns
Ex: 6 months in the first column and 12 months in the rest

Adjustment:
Reduce latest accident year’s future incremental loses to be proportional to the level of earned exposure in first period
Then simulate process variance (or reduce after process variance step)

38
Q

Explain what is last period heteroecthesious data is and how to adjust for it.

A

Partial last calendar period data.

  1. Annualize exposures in last partial diagonal
  2. Calculate fitted triangle and residuals
  3. using ODP bootstrap simulation, calculate and interpolate LDFs from fully annualized sample triangles
  4. Adjust last diagonal of the sample triangles to de-annualize incremental on the last diagonal
  5. Project future values by multiplying the interpolated LDFs with the new cumulative values
  6. Reduce future incremental values for the latest accident year to remove future exposure
39
Q

Explain how to adjust for exposures changing significantly over the years (ex: rapidly growing line or line in runoff)

A

If earned exposures exist, divide all claims data by exposures for each accident year to run the model with pure premiums.

After process variance step, multiply the result by accident year exposures to get total claims.

40
Q

Explain parametric bootstrapping

A

Purpose: way to overcome a lack of extreme residuals in an ODP bootstrap model

  1. Fit parametrized distribution to the residuals
  2. Resample residuals from distribution instead of the observed residuals
41
Q

Explain the purposes of bootstrap diagnostics

A

Purpose:
Find a set of models and parameters that results in the most realistic and most consistent simulations based on statistical features of data.

  1. Test assumptions in model
  2. Gauge the quality of model fit to data
  3. Help guide adjustments of the model parameters to improve fit of the model
42
Q

How can we use graphs to test assumption that residuals are IID

A

Plots to look at:
1. Residuals vs Development Period (look for heteroscedasticity)
2. Residuals vs Accident Period
3. Residuals vs Payment Period
4. Residuals vs Predicted

Look for issues with trends
Plot relative std dev of residuals and range of residuals to further test for heteroscedasticity

43
Q

Explain the Normality Test

A

Compares residuals to the normal distribution. If residuals are close to normal, you should see:
1. Normality plot with residuals in line with diagonal line (normally distributed)
2. High R^2 and p-value greater than 5%

Note: in ODP bootstrap, residuals do not need to be normally distributed

44
Q

Calculate AIC and BIC to test normality of residuals

A

AIC = 2p + nln(2pi*RSS/n)+1)

BIC = nln(RSS/n) + pln(n)

Smaller values indicate that residuals fit a normal distribution better.

AIC and BIC add a penalty for more parameters.

45
Q

Explain how to identify outliers graphically

A

Use a box-whisker plot:
Box shop 25th - 75th percentile
Whiskers extend to the largest values within 3 times the inter-quartile range
Values outside whiskers are outliers

46
Q

Explain how to review the estimated-unpaid model results

A
  1. Standard error should increase from oldest to most recent years
  2. Standard error for all years should be larger than any individual year
  3. Coeff of variation should decrease from oldest to most recent years due to independence in incremental payment stream.
  4. A reversal in coeff of variation in recent years could be due to:
    a) Increasing parameter uncertainty in more recent years
    b) Model may overestimate uncertainty in recent years, we may want to switch to BF or Cape Cod model
  5. Min/Max simulations should be reasonable
47
Q

Explain 2 methods to combine results of multiple models

A
  1. Run models with same random variables
    a) Simulate random variables for each iteration
    b) Use same set of random variables for each model
    c) Use model weights to weight incremental values from each model for each iteration by accident year
  2. Run models with independent random variables
    a) Run each model separately with different random variables
    b) Use weights to randomly select a model for each iteration by accident year so that the result is a weighted mixture of models
48
Q

2 characteristics of estimated cash flow results

A
  1. Std error of calendar year unpaid decreases as calendar year increases in future
  2. Coeff of variation increases as calendar year increases

This is because the final payments projected farthest out will be the smallest and most uncertain

49
Q

Explain how to estimate ultimate loss ratio

A

Estimated ultimate loss ratios by accident year are calculated using all simulated values, not just the future unpaid values.

Represents the complete variability in loss ratio for each accident year.

Loss ratio distributions can be used for projecting pricing risk.

50
Q

2 issues with correlation methods

A

Both location mapping and re-sorting methods use residuals of incremental future losses to correlate segments.
Both tend to create overall correlations of close to zero.

For reserve risk, the correlation that is desired is between total unpaid amounts for two segments so there may be a disconnect.

51
Q

Explain how to account for correlation between segments (2)

A
  1. Re-sorting
    Use algorithms such as Copula or Iman-Conover to add correlation
  2. Location Mapping
    For each iteration, sample residuals from residual triangles using the same locations for all segments
52
Q

3 advantages to account for correlation between segments using Re-sorting

A
  1. Data triangles can be different shapes/sizes by segment
  2. Can use different correlation assumptions
  3. Different correlation algorithms may have other beneficial impacts on aggregate distribution
    Ex: can use a copula with heavy tail distribution to strengthen the correlation between segments in tails, which is important for risk-based capital modeling.
53
Q

2 advantages to account for correlation between segments using location mapping

A

Method is easily implemented
Does not require an estimated correlation matrix

Preserves the correlation of original residuals

54
Q

2 disadvantages to account for correlation between segments using location mapping

A
  1. All segments need to have same size data triangles with no missing data
  2. Correlation of original residuals is used, so we cannot test other correlation assumptions
55
Q

List potential data issues associated with applying ODP bootstrap model

A
  1. Negative incremental values
  2. Non-zero sum of residuals
  3. Using n-year weighted average
  4. Missing values
  5. Outliers
  6. Heteroscedasticity
  7. Heteroecthesious data
  8. Exposures changing over time
  9. Lack of extreme residuals
56
Q

Briefly describe 4 diagnostics tests to evaluate GLM model for reasonableness

A
  1. Standard error should increase from older to more recent Ays, because std error should follow size of increasing unpaid loss reserve.
  2. Total, all-year std error should be larger than the standard error for any individual year.
  3. CV should decrease moving from older to more recent Ays, with possible exception of the most recent year (where parameter uncertainty might overpower process uncertainty). This is because older years have a smaller loss reserve and there are few claims payments remaining.
  4. CV of total reserve should be less than any individual year because the model assumes accident years are independent.
57
Q

Describe 2 practical limitations of a log-link GLM bootstrap modeling framework.

A
  1. It cannot handle negative incremental values
  2. GLM bootstrap must be solved at each iteration making it time consuming.