GLMs Flashcards
Advantages to GLMs
Adjusts for correlation, with less restrictive assumptions to classical linear models
Disadvantages to GLMs
Often difficult to explain results
Components of GLMs
Random component
Systematic component
Random component of GLM
Each yi assumed to be independent and come from exponential family of distributions
Mean = µi
Variance = [ØV(µi)]/wi
Variance within a GLM, formula
Variance = [ØV(µi)]/wi
Ø
Dispersion/Scale factor
Same for all observations
V(µ)
Variance function, given for selected distribution type
Describes relationship between variance and mean
Same distribution type must be used for all observations
Systematic component of GLM
g(µi) = ß0 + ß1xi1 +… + ßpxip + offset
Allows you to manually specify estimates of variables
Link functions, g(x)
Allows for transformations of linear predictor
Link function often used for binary target variables
Logit link:
g(µ) = ln [µ / (1 - µ)]
Link function used in rating plans
g(µ) = ln(µ)
Allows transformation of linear predictor into multiplicative structure
Advantages of multiplicative rating plans
Simple/practical to implement
Guarantees positive premiums
Impact of risk characteristics more intuitive
Variance functions for exponential families
When to use weights in a GLM
When a single observation contains grouped information
Different observations represent different time periods
(If neither apply, weights are all 1)
Weights are usually the denominator of the modeled quantity
Severity model distributions
Tend to be right-skewed
Lower bound at zero
Gamma and inverse Gaussian distributions exhibit these properties
Gamma vs. inverse Gaussian distributions
Gamma most commonly used for severity
Inverse Gaussian has sharper peak, wider tail (more appropriate for more skewed distributions)
Frequency model distributions
Poisson (technically, ODP) - most common, Ø can be > 1
Negative binomial - Ø = 1 but there is a K in variance function
Pure premium distributions
Large point mass at zero (most policies have no claims)
Right-skewness due to severity distribution
Tweedie distribution most commonly used
Tweedie distribution
p = “Power parameter”
1 < p < 2: Poisson frequency and Gamma severity
Assumes frequency and severity move in same direction (not realistic)
Mean, Tweedie
Poisson mean x Gamma mean
= λ x αθ
Variance, Tweedie
صp
p, Tweedie
p = (α + 2) / (α + 1)
Dispersion factor, Tweedie
Probability model distributions
Binomial distribution used
Typically use logit function
µ / (1 - µ) known as odds
Odds, logit function
µ / (1 - µ)
For each unit increase in a given predictor variable iwth coefficient ß increases the odds by eß - 1 in percentage terms
Offset term
Incorporates pre-determined values so GLM takes as given
Continuous predictor variables
If log link function is used, continuous variables should be logged (flexibility in fitting curve shapes)
Exceptions: year variables for trends, variables containing values of 0
Categorical predictor variables
Have 2 or more levels, converted to binary variables
Level with highest number of observations usually deemed base levels
Using a different level results in wider CIs
Matrix notation of GLM
g(µ) = Xß
µ is the vector of µi values
ß is the vector of ß parameters (coefficients)
X is the design matrix (rows are results ignoring coefficients)
Degrees of freedom
Number of parameters that need to be estimated for the model
If a variable is not significant in a GLM
Should be removed, grouped with the base level
Options for dealing with very high correlation in GLMs
Multicollinearity
Near-perfect linear dependency among 3 or more predictor variables
Ex: x1 + x2 ~ x3
Detected with variance inflation factor statistic (VIF) > 10 is considered high
Aliased variables
Perfect linear dependency among predictor variables
GLM will not converge, but most will detect and remove one of the variables from the model
GLM limitations
- GLMs give full credibility (partially adressed by p-values, SEs)
- GLMs assume randomness of outcomes are uncorrelated (violated if dataset has several renewals of same policy, or by weather events)
Model-building process
- Setting goals and objectives
- Communication (IT, legal, UWs)
- Collecting/processing data
- Exploratory data analysis
- Specifying the form of the model
- Evaluating model output
- Validation
- Translation into a product
- Maintenance and rebuild
Splitting data for testing
Training set and test (holdout) set
Model testing strategies
Train and test
Train, validate, test
Cross-validation
Train and test
Split into single training and single test sets (60/40 or 70/30)
Can split randomly or on time (if not done by time, could lead to over-optimistic validation results)
Train, validate, and test
Validation set can be used to refine model and make tweaks
Test set should be left until model is final
Typically 40/30/30
Cross-validation
Less common in insurance (hand-picked variables)
Most common is k-fold:
1. Pick a k and split into k folds
- For each fold, train the model using the other k - 1 folds, and test using kth fold
Superior (more data for training and testing) but more time-consuming (models built completely separately)
When to use test dataset
Only when model is complete
If too many decisions are made based on test set, it is effectively a training set (leads to overfitting)
Advantages of modeling freq/sev over PP
Gain more insight and intuition about each
Each is more stable separately
PP modeling can lead to overfitting if predictor variable only impacts freq or sev but not both, since randomness of other component may be considered signal
Tweedie assumes both freq and sev move in same direction
Handling perils in a GLM
- Run each peril model separately
- Aggregate expected losses
- Run model using all-peril LC as target variable and union of all predictor variables as predictors (focus on dataset more reflective of future mix of business)
Criteria for variable inclusion in GLM
p-values
Cost-effectiveness of collecting data
ASOPs/legal requirements
IT constraints
Partial residuals for predictor variables
Plot r against x and see if the points match y = ßx
Transformations if residuals do not match line
Binning (increases DOF, variation within bins is ignored)
Adding polynomial terms (loses interpretability without a grap)
Add piecewise linear function: hinge function max (0, x - c) at each break point c
Interaction term combinations
Two categorical: 1/0
Continuous and categorical: f(x)/0
Two continuous: product of the two
Log-likelihood
Log of the product of the likelihood for all observations using the model
Deviance
Comparing models using LL and deviance
Only valid if datasets used are identical
Comparisons of deviance only valid if assumed distribution and dispersion are the same
Nested models (F-test)
Akaike Information Criterion (AIC)
AIC = -2LL + 2p
Bayesian Information Criterion (BIC)
BIC = -2LL + p ln (n)
Less reasonable for insurance data (large n, large BIC)
Q-Q plot
Sort deviance residuals in ascending order (y-axis)
Ø-1 [(i - 0.5) / n] for x-coordinates
If model is well-fit, points will appear on a straight line
Model stability measures
Cook’s distance
Cross-validation
Bootstrapping
Model selection methods
Plotting actual vs. predicted
Simple quantile plots
Double lift charts
Loss ratio charts
Gini Index
Lift
Economic value of the model (ability to prevent adverse selection)
Creating simple quantile plot
Sort holdout datased based on predicted LC
Bucket into quantiles by exposure
Calculate average predicted LC and average actual LC from each bucked and plot (divide both values by overall average predicted LC)
Winning model, simple quantile plots
- Predictive accuracy
- Monotonicity
- Vertical distance of actual LC between first and last quantiles
Double lift chart
- Calculate sort ratio
- Sort by sort ratio
- Bucket into quantiles by exposure
- Calculate average predicted LC for each model and average actual LC for each bucket, divide by overall average LC
Loss ratio chart
- Sort holdout dataset based on predicted LC
- Bucket into quantiles by exposure
- Calculate actual loss ratio (using current rating plan)
Greater distance between lowest and highest, greater model does identifying further segmentation opportunites
Gini Index
- Sort holdout dataset based on predicted LC
- Plot the graph with x-axis as % cumulative exposures, y-axis as cumulative % of actual losses
True positive
Correct prediction that the event occurs
False positive
Prediction that the event occurs, but it does not
False negative
Prediction that the event does not occur, but it does
True negative
Correct prediction that the event does not occur
Sensitivity of a model
Ratio of true positives to total event occurrences
Sometimes called true positive or hit rate
Specificity
Ratio of true negatives to total event non-occurrences
Receiver Operating Characteristic (ROC) Curve
Plots sensitivity as a function of 1 - specificity
Ensemble models
If two or more models perform roughly equally well, can combine those models; only really works when model errors are as uncorrelated as possible