Robertson - NCCI's 2007 HG mapping Flashcards
Summarize the process used in the 2007 NCCI HG mapping study
- Developed Excess LR for each class at five selected limits
- Grouped classes with similar standardized, credibility weighted excess ratio vectors using weighted, k-means clustering analysis
- Enhanced cluster/grouping using PC analysis
- Determined Optimal number of groups (7) using weighted, k-means cluster analysis
- Had underwriter panel review the initial groups, revised groupings based on their input
Why was 5 limits selected?
- . ELFs at any pair of limits are highly correlated
2. Limits below 100K are heavily represented in the list of 17 limits
Why standardize?
When is standardization appropriate?
Why? when variables have different units, spreads; prevents a variable with large values from exerting undue influence on cluster results.
It’s appropriate when the spread of values is due to normal random variation
Not appropriate if due to presence of sub-classes.
why did NCCI decide not to standardize?
- XS ratios share a common unit of measure($ of excess loss/ $ of total loss); standardizing results in a new variable without a common unit interpretation
- standardizing could result in excess ratio outside the range of 1
- standardizing reduced the influence of lower loss limits, where the bulk of data is
How k-mean algorithm works
- Assign classes to k arbitrary groups
- calculate Ri of each group (weighted excess ratios)
- compare excess ratio of each class to those of all centroids
- move each class to group with closest centroids
- if any class move, go back to step 2 and repeat.
- This is analogous to maximizing R^2
- it minimizes the within variance and maximizes the between variance
How did the NCCI decide to use seven as the new number of HGs?
- Two test statistics selected:
- Calinski/Harabasz
- Cubic Cluster Criterion
for both tests, higher statistic => better cluster
- Three scenarios tested
- all classes
- only classes with over 50% credibility
- only 100% credibility classes - number of groups tested was between 4 & 9
- 7 groups were indicated in 5 of the 6 tests
5 The exception was the CCC test on all classes, which indicated 9 groups.
This was given little emphasis because:
- CH test outperforms CCC
- CCC deserves less weight when correlation is present as is the case in all NCCI scenarios
- selection should be drive by large credible classes
- there was crossover in ELFs in the 9HGs
on what basis does NCCI define HGs
Why are HGs defined on a country-wide basis, does not vary by state.
A HG is a collection of WC classifications that have similar ELFs over a wide range of limits.
NCCI defines HGs on a country-wide basis. HGs does not vary by state. NCCI takes the view that classes are homogeneous w.r.t operations of the insureds, and therefore the relative mix of injuries within a class should not vary much from state to state.
Describe the desirable optimality properties that result from k-means to determine clusters
It is equivalent to maximizing R-sqaured in linear regression. It maximizes the variance between groups while minimizing the variance within groups.
Credibility by class is determined by the following formula: Z = min(n/(n+k)*1.5,1)
What is the one consideration when deciding whether to use this credibility formula
Describe two alternative methods.
Consideration: what size of class is required to achieve full credibility
k is based on the average # of claims per class
alternative method: Eliminate med-only claims, could replace k with median claims per class
One advantage of PC analysis over GLM
PC analysis identifies variables that are most predictive of the outcome, allowing one to eliminate other correlated variables from the model. It makes the model simpler without much loss of function.
Describe two test statistics that could be used to determine the optional number of groups from the cluster analysis.
- Calinski-harbasz: measure the between variance divided by within variance
- Cubic Clustering Criterion (CCC): compares variance explained by clusters to that explained by randomly assigned clusters.