Mahler Flashcards

Question 1

Q

Describe 3 advantages of Baseball data over insurance data

Answer

A

Constant set of risks (teams)
In insurance, insureds leave and enter database
Baseball loss data is readily available, accurate and not subject to development
Insurance loss data is sometimes hard to compile/obtain and is subject to possible reporting errors and loss development
Each team plays roughly the same number of games (equal size loss experience)

Question 2

Q

Describe 2 methods to identify if risk parameters are shifting over time

Answer

A

Chi-Square Test
H0: each years were drawn from distribution with same mean frequency
X^2 = (actual - expected)^2 / expected
if test statistic > table value with df = n-1, we reject H0 and conclude that risk parameters are shifting over time
Compare correlations between years
a. Compute correlations between results for pairs of years of all risks
b. Take straight average of correlations for each separation in time
c. Examine how avg correlation depends on time
If correlation between years closer together higher than those further apart, we can conclude risk parameters are shifting over time

Using the 2 tests, Mahler concluded risk parameters are shifting.

Question 3

Q

What is the first question Mahler wanted to answer?

Answer

A

Is there a difference between teams?

Elementary analysis shows there is a non-random difference between teams (different avg experience, only a small number have losing % between 49% and 51%)

A team that has been worse than average over one period of time is more likely to be worse than average over another period of time

Conclusion: if we wish to predict future experience of a team, there is useful information in past experience of that team

Question 4

Q

If we conclude risk parameters are shifting over time, what does it imply?

Answer

A

We want to use credibility-weighting formulas that apply more credibility on recent years and less on older years.

Question 5

Q

Describe the 3 ways to estimate losing percentage for next year

Answer

A

u = ground mean = 50%
yi = most recent year of actual value of x

Xest = u
Every risk is average, ignores past data (Z=0%)
Xest = Y1
Assume most recent year repeats (Z1=100%)
Xest = ZY1 + (1-Z)u
Cred-wtg of last year and u
Xest = Z/n * sum of (Yi + (1-Z)u)
Give equal weight z/n to n most recent years of data
Xest,i+1 = ZYi + (1-Z)Xest,i
Exponential smoothing: give latest year of data weight Z and (1-Z) to prior estimate
Xest = Sum of ZiYi + (1 - Sum of Zi)u
Most general formula
Would be calculated by computer
Increasing n will never produce inferior estimate since you can always give oldest years of data 0 weight

Question 6

Q

Explain how to determine which Z to use in the methods

Answer

A

using either buhlmann/bayesian or classical/limited fluctuation credibility methods one determines which z will be expected to optimize selection criterion in future

One can also empirically investigate which credibility would have optimized selected criterion if it had been used in past (retrospective tests)

Question 7

Q

Describe 3 criteria used to evaluate quality of estimate

Answer

A

Least Squared Error
SSE = (Xest - Xactual)^2
MSE = SSE / n
n is number of teams * number of years
The smaller the MSE, the better solution
Method 2 is preferred under this criterion
Used by B&S
Small Chance of Large Errors (Limited Fluctuations)
Measures prob that observed result differed by more than certain % from predicted result
The less is this prob, the better the solution
Method 2 is preferred under this criterion
Meyers/Dorweiler
Calculate correlation between predictions and prediction errors
The smaller the corr, the better the solution
Vector 1 = Actual%/Pred%
Vector 2 = Pred%/50%
Method 2 is preferred under this criterion
Not interested in size of errors, only in correlation

Question 8

Q

Explain why use of more years of data does not result in higher Z

Answer

A

Since param are shifting substantially over time, use of older data (with equal weight) leads to a worse estimate

Question 9

Q

Explain why M/D cannot help chose optimal number of years

Answer

A

For each choice of number of historical years used, there can be a choice of credibility that results in 0 correlation

Question 10

Q

Describe the impact of delay in data on Z and prediction accuracy

Answer

A

Not having the most recent year of historical data significantly increases squared error of estimate

Optimal credibility typically decreases when there is a delay in getting data

Less current info is less valuable for estimating future

Question 11

Q

Explain the results of tests on Baseball data

Answer

A

Optimal credibility range from 50% to 70% will perform relatively well under all 3 criteria.

If Z is close, not exactly, to optimal level, exist only relatively small impact on result.

Question 12

Q

In which case, the 3 tests would not agree on optimal Z

Answer

A

LSE & Limited Fluctuation are focused on limiting large errors

M/D is focused on pattern (corr) between errors and mod

A situation where errors are small but correlated with pre-to-overall avg would be preferable for first 2 but not under M/D

Question 13

Q

Contrast hierarchical clustering versus non-hierarchical

Answer

A

Non-hierarchical clustering means new HG represents the best partition for the given number of clusters

Hierarchical clustering requires that each new group be a subset of an existing group

Question 14

Q

How do you determine the df for Chi-Square table search.