A.2. Generalized Linear Models for Insurance Rating Flashcards
GLM random component
Each yi is assumed to be independent and to come from the exponential family of distributions with mean µi and variance Var(yi) = φV(µi)/ωi
- φ is called the dispersion parameter and is a constant used to scale the variance.
- V(µ) is called the variance function and is given for a selected distribution type. It describes the relationship between the variance and mean. Note that the same distribution type (e.g., Poisson) must be assumed for all observations.
- ωi are known as weights and assign a weight to each observation i.
GLM systematic component
g(µi) = β0 + β1xi1 + β2xi2 + · · · + βpxip + offset
• The right hand side is known as the linear predictor.
• The offset term is optional and allows you to manually
specify the estimates for certain variables (usually based on other analyses).
• The x predictor variables can be binary (as for levels
of categorical variables) or continuous, or even
transformations or combinations of other variables.
• g(µ) is called the link function, and allows for
transformations of the linear predictor.
• β0 is called the intercept term, and the other β’s are called the coefficients of the model.
Advantages of multiplicative rating plans
- Simple and practical to implement.
- They guarantee positive premiums (not true for additive terms).
- Impact of risk characteristics is more intuitive.
Variance Functions for exponential family
distributions
Distribution Variance Function Normal V(µ) = 1 Poisson V(µ) = µ Gamma V(µ) = µ^2 Inverse Gaussian V(µ) = µ^3 Negative Binomial V(µ) = µ(1 + κµ) Binomial V(µ) = µ(1 − µ) Tweedie V(µ) = µ^p
Choices for Severity distributions
In insurance data, claim severity distributions tend to be
right-skewed and have a lower bound at 0. Both the Gamma and Inverse Gaussian distributions exhibit these properties, and as such are common choices for modeling severity. The Gamma distribution is the most commonly used, but the Inverse Gaussian has a sharper peak and wider tail, so it is more appropriate for more skewed severity distributions.
Choices for Frequency distributions
Claim frequency is most often modeled using a Poisson
distribution. The GLM implementation of Poisson allows
for the distribution to be continuous instead of discrete.
Technically, the overdispersed Poisson is recommended, which allows φ to be different than 1, and thus allows the variance to be greater than the mean (instead of being equal as with the typical Poisson).
Another choice for frequency modeling is the Negative
Binomial distribution, which is really just a Poisson
distribution with a parameter that itself has a Gamma
distribution. With the Negative Binomial, φ is restricted to 1, but instead it contains a dispersion parameter κ in its variance function that allows for the variance to exceed the mean
Relationship between Poisson, Gamma, and Tweedie parameters
• Poisson has parameter λ, which equals its mean and
variance
• Gamma has mean αθ and variance αθ^2 , and thus coefficient of variation 1/ √α
• Tweedie has mean µ = λ × (αθ) and variance φµ^p
• p = (α+2)/(α+1) , so it depends entirely on the Gamma coefficient of variation
• The Tweedie dispersion parameter is
φ =[λ^(1−p)×(αθ)^(2−p) ] / (2−p)
Logit and Logistic Functions
Logit: g(µ) = ln [µ/(1−µ)]
. The ratio of µ/(1−µ) is known as the odds (e.g., a thousand to one).
Logistic function (inverse of logit): 1/(1+e^(−x)) .
Why continuous predictor variables should usually be logged and exceptions
Continuous variables should usually be logged when a log link function is used to allow GLMs flexibility in fitting
different curve shapes to the data (other than just exponential growth).
Exceptions to the general rule of logging a continuous
predictor variable exist such as using a year variable to pick up trend effects. Also, if the variable contains values of 0, an adjustment such as adding 1 to all observations must first be made since ln(0) is undefined.
Impact of choosing a level with fewer observations as the base level of a categorical variable
This will still result in the same predicted relativities for that variable (re-based to the chosen base level), but there will be wider confidence intervals around the estimated coefficients.
Matrix form of a GLM
g(µ) = Xβ, where µ is the vector of µi values, β is the vector of β parameters, and X is called the design matrix.
Degrees of freedom for a model
The degrees of freedom of a model is the number of
parameters that need to be estimated for the model.
GLM outputs for each predicted coefficient
Standard error
p-value: an estimated probability that the absolute value
of a particular β is at least that different from 0 by pure chance
Confidence interval
How number of observations and dispersion parameter impact p-values
p-values (and standard errors and confidence intervals) will be smaller with larger datasets that have more observations. They will also be smaller with smaller values of φ.
Problem and options for GLMs with highly correlated
variables
This can result in an unstable model with erratic coefficients that have high standard errors. Two options for dealing with very high correlation include:
- Removing all highly correlated variables except one. This eliminates the high correlation in the model, but it also potentially loses some unique information contained in the eliminated variables.
- Use dimensionality-reduction techniques such as principal components analysis or factor analysis to create a new subset of variables from the correlated variables, and use this subset of variables in the GLM. The downside is the additional time required to do this extra analysis.
Define multicollinearity and give a way to detect it
Multicollinearity occurs when there is a near-perfect linear dependency among 3 or more predictor variables. For example, suppose x1 + x2 ≈ x3. This is more difficult to detect since both x1 and x2 may not be individually highly correlated with x3. When multicollinearity is present in a model, the model may become unstable with erratic coefficients, and it may not converge to a solution. One way to detect multicollinearity is to use the variance inflation factor
(VIF) statistic, which is given for each predictor variable, and measures the impact on the squared standard error for that variable due to collinearity with other predictor variables by seeing how well other predictor variables can predict the variable in question. VIF values of 10 or greater are considered high.
Define aliasing and how GLM software deals with it
When there is a perfect linear dependency among predictor variables, those variables are aliased. The GLM will not converge in this case, but most GLM software will detect this and automatically remove one of those variables from the model.
2 important limitations of GLMs
- GLMs give full credibility: The estimated coefficients are not credibility-weighted to recognize low volumes of data or high volatility. This concern can be partially addressed by looking at p-values or standard errors.
- GLMs assume that the randomness of outcomes are
uncorrelated: Two examples of violations of this are:
• Using a dataset with several renewals of the same policy, since the same insured over different renewals is likely to have correlated outcomes.
• When the data can be affected by weather, the same
weather events are likely to cause similar outcomes to
risks in the same areas
Components of model-building process
- Setting goals and objectives
- Communication with key stakeholders
- Collecting and processing the data
- Conducting exploratory data analysis
- Specifying the form of the model
- Evaluating the model output
- Validating the model
- Translating the model results into a product
- Maintaining and rebuilding the model