C.3 Multivariate Classification Flashcards
Advantages and disadvantage of univariate analysis
Advantages: Simple to calculate and intuitive
Disadvantage: It doesn’t properly account for the impact
of correlated variables. This is of key importance as many
variables in insurance are correlated.
Three reasons GLMs have grown in popularity
- Increased computing power
- Better data availability
- Competitive pressure
Benefits of multivariate methods (particularly GLMs)
- They properly adjust for exposure correlations between
rating variables. - They attempt to focus on the “signal” in the data (systematic effects) and ignore the “noise” (unsystematic effects).
- They provide statistical diagnostics (e.g., confidence
intervals). - They allow for the consideration of interactions between rating variables.
Advantage and disadvantages of Minimum Bias
procedures
Advantage: They properly adjust for exposure correlation.
Disadvantages: They do not provide ways to test for
whether variables are statistically significant and they are
computationally inefficient.
Describe Sequential Analysis
To perform the analysis, first you perform a standard
univariate analysis to obtain indicated relativities for a single variable. Next, you perform the Adjusted Pure Premium Approach to obtain indicated relativities for a second variable, based on adjusting exposures as a result of the prior variable’s selected relativities. You then repeat the Adjusted Pure Premium Approach for all remaining variables, having adjusted for all prior variables at each step. Only one pass through the variables is done, and the method is not iterative.
While this method does deal with exposure correlation, the main criticism is that it doesn’t have a closed-form solution, meaning that the results change based on the order of variables that are chosen.
Some important steps in solving GLMs
Compiling a dataset with enough data for modeling, selecting a link function, specifying the distribution of the underlying random process, and using maximum likelihood to calculate the parameters of the model.
Why GLMs are usually run on frequency and severity
instead of loss ratios
There is no need to on-level premiums at the granular level, actuaries have a priori expectations of frequency and severity patterns but not loss ratio patterns, loss ratio models become obsolete when rates are changed, and there is no standard distribution for modeling loss ratios.
Some common GLM diagnostic tests
-Looking at standard errors (confidence intervals) around
estimates.
-Using Chi-Square tests, F-tests, and other deviance tests to choose between competing models with different variables.
-Running the model on separate consecutive time periods of data to see if the estimated parameters are consistent over time.
-Building the model on 1 subset of historical data, and then comparing the model’s predictions with the actual results on a second subset of historical data (known as a holdout sample). This can identify whether the model is over-fitting or under-fitting the original dataset.
-Judgmentally deciding whether the results seem
reasonable.
How actuaries can play a key role in using GLMs
-Obtaining reliable data for use in modeling (i.e., GIGO:
Garbage In, Garbage Out).
-Exploring anomalous results in the GLM with additional
analysis.
-Considering model results from both a statistical and
business perspective.
-Developing appropriate methods to communicate the
model results based on the company’s ratemaking
objectives.
Common types of external data used in GLMs
-Geo-demographic information: such as population density
-Weather data: such as average rainfall or number of days below freezing
-Property characteristics: such as square footage or quality of the local fire department
-Information about insured individuals or businesses: such as credit scores
Some data mining techniques
-Factor Analysis: A technique to reduce the number of
variables needed in a classification ratemaking analysis. An example is the symbol variable in auto insurance.
-Cluster Analysis: A method to combine similar risks into
groups. An example is creating territories using zip codes.
-CART: Stands for Classification and Regression Trees. This can build a set of if-then rules for use in classification.
-MARS: Stands for Multivariate Adaptive Regression
Spline. This helps turn continuous variables into categorical variables.
-Neural Networks: Methods by which training algorithms
are given a set of data and identify any patterns. This
can help identify previously unknown interactions between variables.