GLM Flashcards
Model Specifications
Target Variable
Predictors
Link Function
Error Distribution
Weights
Why log continuous variables when using log link?
Otherwise positive coefficients will result in exponential effect on target
When to use weights
If rows in dataset represents an average for multiple data points
When to use offsets
- Variable modeled elsewhere (territory model, deductible, limits)
- Want to change only some variables in rating plan
- Target variable varies directly with exposure (e.g. modeling claim counts)
Correlated Vars (How Identify and Adjust)
Identify with two-way correlation table
Can adjust with principal component analysis
Multicollinearity
When 2 or more predictors strongly predictive or a third
Two-way tables may not show
Identify with Variance Inflation Factor
Aliasing
Perfect correlation
Must remove one
Effects of highly correlated variables
Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge
GLM Limitations
Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs
Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)
Freq/Sev Models vs PP
More stable
PP can overfit if variable only affects one or other
Tweedie dist assumes freq/sev move in same direction
Target Variable Considerations
Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium
Predictor Selection Criteria
Significance
Cost of collecting
IT constraints
Regulatory requirements
Partial Residual Plots
Detect non-linearity
Correcting Non-Linearity
- Binning
- Polynomial terms
- Piecewise
- Splines
Loglikelihood
How well model explains data
Con: Requires identical dataset
Deviance
How far model is from saturated model
Want unscaled to compare different models
Cons:
-Need to assume same error dist. in models
-Always decreases when add more params so overfits
F-Test
Only for nested models
AIC and BIC
Can compare any models
BIC over-penalizes large datasets
Working Residual Plots
vs Linear Predictor (look for systemic over/under prediction)
vs One Predictor (look for non-linearity)
vs Weight (look for homoscedasticity)
3 Ways to Assess Model Stability
- Cook’s Distance (influential records)
- Cross-Validation (consistent param estimates - not good for insurance when more manual intervention needed)
- Bootstrapping (consistent param estimates)
Quantile Plot
How well model differentiates between best and worst risks
Quintiles by prediction. Then plot actual vs predicted
Want:
Actual close to predicted
Actual increasing monotonically
Large lift from actual endpoints
Double Lift Chart
Compares two models directly
Quintiles by A/B. Then plot actual, A, B
Want model that’s closer to actual
Harder to interpret since compares where models disagree most
Loss Ratio Chart
Whether model better at segmenting than current
Sort by Predicted LR. Plot actual LR
Want more spread
Only tells if good at segmenting not predicting
Gini Index and Lorenz Curve
Plot cuml % of EE vs cuml % of Loss
Lorenze curve formed by points
Gini Index = 2 * area between curve and y=x
ROC Curve
Used for logistic models
1-Specificity vs Sensitivity
AUROC higher better (no predictive power is y=x or .5)
Specificity
True Neg/Total Negative
Sensitivity
True Pos/Total Positive
Why shouldn’t model ILFs or Deds
Policy Options chosen by insured
May give counterintuitive results
Never charge more for less coverage
Correlation with results but not causation
Ensembling
Averaging models together
Improves performance when errors uncorrelated
GLM Shortcomings
- Predictions must be based on linear function of predictors
- Instability if data thin or highly correlated vars
- Full credibility for each predictor coefficient
- Assumes randomness uncorrelated
- Dispersion param must be constant
GLMM
Allows credibility in coeff estimates
Fixed and random effects
DGLMs
Allows different dispersion parameters
GAMs
Allows non-linearity without manual intervention
MARS Models
Allows non-linearity without manual intervention
Elastic Net GLMs
Allows credibility in coeff estimates
Automatic variable selection
Why center continuous variables
Intercept more intuitive since meaningful base case
Makes signs of coefficients more intuitive
Working Residual Advantages
Retain properties after binning (no pattern and homoscedastic)