A2. GLM Flashcards
Advantage of using log link function for rate making
- Simple and practical to implement
- Guarantee positive premiums
- impact of risk characteristics is more intuitive
2 uses of offset terms
- incorporate pre-determined values for certain variables
- when target variable varies directly on a particular measure
2 solutions to deal with correlation
- remove all except one (but could lose unique info)
- Dimensionality reduction technique like PCA/Factor analysis (but takes additional time)
Problem with correlations among variables
could produce unstable model with erratic coefficients that have high standard error
2 uses of weight assigned to each observation
- when an observation can obtain grouped information
- when different observations represent different time periods
define multicollinearity
nearly perfect linear dependency
how to detect multicollinearity
use VIF (variance inflation factor) to detect
VIF >=10 considers high
define aliasing
perfect linear dependency
GLM will not converge
2 GLM limitations
- GLMs give full credibility, even to low volume of data or high volatility
- GLMs assume that randomness of outcomes are uncorrelated (renewal of the same policy, weather events)
4 Advantages of modeling frequency/severity over pure premium
- gains more insights and intuition about impact of each predictor variable.
- each of frequency and severity separately is more stable
- pure premium modeling can lead to overfit if a predictor variable only impact frequency or severity but not both.
- Tweedie distribution for pure premium model assumes both frequency and severity move in the same direction (which may not be true)
2 disadvantage of modeling frequency/severity over pure premium
- require more data
- take more time to build 2 models
4 ways to transform variables in GLM
- binning the variable (increase df, more things to estimate which may lead to overfit, may result in inconsistent or impractical pattern, variation within bins is ignored)
- Add polynomial terms (loss of interpretability without a graph, higher order polynomials can behave erratically at edges of the data)
- Add piecewise linear function (adding hinge function max (0, Xj-C) at each breaking point C, Breaking point C must be manually chosen)
- Natural Cubic Splines (combines piecewise function and polynomials, results in continuous curve but fits edges of the data better, but need graph to interpret model)
Why is model selection different from model refinement
- some model may be proprietary
- decision on final model may be a business decision not a technical one
3 methods to test model stability
- Cook’s distance for individual record (high cook’s distance should be given additional scrutiny to whether include or not)
- Cross-validation comparing in-sample parameter estimates
- Bootstrapping to compare mean and variance
4 Lift based measures
- simple quintile plot
- double lift chart
- loss ratio charts
- Gini index