A5. Robertson Flashcards
NCCI credibility weighted XS ratios
Rc Final = ZRc + (1-Z) Rhg
Z = min(n/(n+K) *1.5, 1)
where n = $ of claims in the class, k= average number of claims per class
Other credibility options:
- using median instead of vag of k
- exclude med only from n and k
- include only serious clains in n and k
- requiring min # of claims for classes used
- various square root rules
Why did the NCCI use only 5 limits
- XS ratios at any pair of limits are highly correlated across classes
- Limits < $100K were heavily represented in the 17 prior limits
- Using 1 limit would not have captured the full variability in XS ratios
- These 5 limits are commonly used for rating
Advantage of L1 distance
- Minimize the relative error in estimating excess premium
- Many small errors would have the same effect as one large error which results in outliers having less of an impact on the results
Advantage of L2 distance
- Penalizes large errors
- minimize squared error
What’s the goal of k-means clustering
minimize the variance within the K clusters and maximize variance between the k clusters
differences between hierarchical vs. non-hierarchical
Hierarchical analyses subdivides a cluster into two clusters.
Non hierarchical seek the best partition of clusters for a pre-specified amount of clusters
Total variance formula
summation of (Wc * (Rc - overall R )^2) / summation of (Wc)
Within variance formula
summation of (Wc * (Rc - avg R for HGi)^2) / summation of (Wc)
Between variance formula
summation of (Wc * (avg R for HGi - overall R)^2) / summation of (Wc)
2 statistics used by NCCI to decide on # of proposed HGs
- Calinski and Harabasz Statistics (C-H)
= [trace(B)/ (k-1)] / [(trace(W)/(n-k)]
= corrected between variance / corrected within variance - Cubic Clustering Criterion (CCC) statistics
- compare the amount of variance explained by a given set of clusters to that expected when clusters are formed at random based on the multi-dimensional uniform distribution
- less reliable when data is highly correlated
For both methods, higher values indicate a better # of clusters