Mahler Flashcards

1
Q

Describe 3 advantages of Baseball data over insurance data

A
  1. Constant set of risks (teams)
    In insurance, insureds leave and enter database
  2. Baseball loss data is readily available, accurate and not subject to development
    Insurance loss data is sometimes hard to compile/obtain and is subject to possible reporting errors and loss development
  3. Each team plays roughly the same number of games (equal size loss experience)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe 2 methods to identify if risk parameters are shifting over time

A
  1. Chi-Square Test
    H0: each years were drawn from distribution with same mean frequency
    X^2 = (actual - expected)^2 / expected
    if test statistic > table value with df = n-1, we reject H0 and conclude that risk parameters are shifting over time
  2. Compare correlations between years
    a. Compute correlations between results for pairs of years of all risks
    b. Take straight average of correlations for each separation in time
    c. Examine how avg correlation depends on time
    If correlation between years closer together higher than those further apart, we can conclude risk parameters are shifting over time

Using the 2 tests, Mahler concluded risk parameters are shifting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the first question Mahler wanted to answer?

A

Is there a difference between teams?

Elementary analysis shows there is a non-random difference between teams (different avg experience, only a small number have losing % between 49% and 51%)

A team that has been worse than average over one period of time is more likely to be worse than average over another period of time

Conclusion: if we wish to predict future experience of a team, there is useful information in past experience of that team

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If we conclude risk parameters are shifting over time, what does it imply?

A

We want to use credibility-weighting formulas that apply more credibility on recent years and less on older years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the 3 ways to estimate losing percentage for next year

A

u = ground mean = 50%
yi = most recent year of actual value of x

  1. Xest = u
    Every risk is average, ignores past data (Z=0%)
  2. Xest = Y1
    Assume most recent year repeats (Z1=100%)
  3. Xest = ZY1 + (1-Z)u
    Cred-wtg of last year and u
  4. Xest = Z/n * sum of (Yi + (1-Z)u)
    Give equal weight z/n to n most recent years of data
  5. Xest,i+1 = ZYi + (1-Z)Xest,i
    Exponential smoothing: give latest year of data weight Z and (1-Z) to prior estimate
  6. Xest = Sum of ZiYi + (1 - Sum of Zi)u
    Most general formula
    Would be calculated by computer
    Increasing n will never produce inferior estimate since you can always give oldest years of data 0 weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain how to determine which Z to use in the methods

A

using either buhlmann/bayesian or classical/limited fluctuation credibility methods one determines which z will be expected to optimize selection criterion in future

One can also empirically investigate which credibility would have optimized selected criterion if it had been used in past (retrospective tests)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe 3 criteria used to evaluate quality of estimate

A
  1. Least Squared Error
    SSE = (Xest - Xactual)^2
    MSE = SSE / n
    n is number of teams * number of years
    The smaller the MSE, the better solution
    Method 2 is preferred under this criterion
    Used by B&S
  2. Small Chance of Large Errors (Limited Fluctuations)
    Measures prob that observed result differed by more than certain % from predicted result
    The less is this prob, the better the solution
    Method 2 is preferred under this criterion
  3. Meyers/Dorweiler
    Calculate correlation between predictions and prediction errors
    The smaller the corr, the better the solution
    Vector 1 = Actual%/Pred%
    Vector 2 = Pred%/50%
    Method 2 is preferred under this criterion
    Not interested in size of errors, only in correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain why use of more years of data does not result in higher Z

A

Since param are shifting substantially over time, use of older data (with equal weight) leads to a worse estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain why M/D cannot help chose optimal number of years

A

For each choice of number of historical years used, there can be a choice of credibility that results in 0 correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the impact of delay in data on Z and prediction accuracy

A

Not having the most recent year of historical data significantly increases squared error of estimate

Optimal credibility typically decreases when there is a delay in getting data

Less current info is less valuable for estimating future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the results of tests on Baseball data

A

Optimal credibility range from 50% to 70% will perform relatively well under all 3 criteria.

If Z is close, not exactly, to optimal level, exist only relatively small impact on result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In which case, the 3 tests would not agree on optimal Z

A

LSE & Limited Fluctuation are focused on limiting large errors

M/D is focused on pattern (corr) between errors and mod

A situation where errors are small but correlated with pre-to-overall avg would be preferable for first 2 but not under M/D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Contrast hierarchical clustering versus non-hierarchical

A

Non-hierarchical clustering means new HG represents the best partition for the given number of clusters

Hierarchical clustering requires that each new group be a subset of an existing group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you determine the df for Chi-Square table search.

A

df = n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly