GLM Flashcards
Model Specifications
Target Variable
Predictors
Link Function
Error Distribution
Weights
Why log continuous variables when using log link?
Otherwise positive coefficients will result in exponential effect on target
When to use weights
If rows in dataset represents an average for multiple data points
When to use offsets
- Variable modeled elsewhere (territory model, deductible, limits)
- Want to change only some variables in rating plan
- Target variable varies directly with exposure (e.g. modeling claim counts)
Correlated Vars (How Identify and Adjust)
Identify with two-way correlation table
Can adjust with principal component analysis
Multicollinearity
When 2 or more predictors strongly predictive or a third
Two-way tables may not show
Identify with Variance Inflation Factor
Aliasing
Perfect correlation
Must remove one
Effects of highly correlated variables
Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge
GLM Limitations
Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs
Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)
Freq/Sev Models vs PP
More stable
PP can overfit if variable only affects one or other
Tweedie dist assumes freq/sev move in same direction
Target Variable Considerations
Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium
Predictor Selection Criteria
Significance
Cost of collecting
IT constraints
Regulatory requirements
Partial Residual Plots
Detect non-linearity
Correcting Non-Linearity
- Binning
- Polynomial terms
- Piecewise
- Splines
Loglikelihood
How well model explains data
Con: Requires identical dataset