GLMs Flashcards
Advantages to GLMs
Adjusts for correlation, with less restrictive assumptions to classical linear models
Disadvantages to GLMs
Often difficult to explain results
Components of GLMs
Random component
Systematic component
Random component of GLM
Each yi assumed to be independent and come from exponential family of distributions
Mean = µi
Variance = [ØV(µi)]/wi
Variance within a GLM, formula
Variance = [ØV(µi)]/wi
Ø
Dispersion/Scale factor
Same for all observations
V(µ)
Variance function, given for selected distribution type
Describes relationship between variance and mean
Same distribution type must be used for all observations
Systematic component of GLM
g(µi) = ß0 + ß1xi1 +… + ßpxip + offset
Allows you to manually specify estimates of variables
Link functions, g(x)
Allows for transformations of linear predictor
Link function often used for binary target variables
Logit link:
g(µ) = ln [µ / (1 - µ)]
Link function used in rating plans
g(µ) = ln(µ)
Allows transformation of linear predictor into multiplicative structure
Advantages of multiplicative rating plans
Simple/practical to implement
Guarantees positive premiums
Impact of risk characteristics more intuitive
Variance functions for exponential families
When to use weights in a GLM
When a single observation contains grouped information
Different observations represent different time periods
(If neither apply, weights are all 1)
Weights are usually the denominator of the modeled quantity
Severity model distributions
Tend to be right-skewed
Lower bound at zero
Gamma and inverse Gaussian distributions exhibit these properties
Gamma vs. inverse Gaussian distributions
Gamma most commonly used for severity
Inverse Gaussian has sharper peak, wider tail (more appropriate for more skewed distributions)
Frequency model distributions
Poisson (technically, ODP) - most common, Ø can be > 1
Negative binomial - Ø = 1 but there is a K in variance function
Pure premium distributions
Large point mass at zero (most policies have no claims)
Right-skewness due to severity distribution
Tweedie distribution most commonly used
Tweedie distribution
p = “Power parameter”
1 < p < 2: Poisson frequency and Gamma severity
Assumes frequency and severity move in same direction (not realistic)
Mean, Tweedie
Poisson mean x Gamma mean
= λ x αθ
Variance, Tweedie
صp
p, Tweedie
p = (α + 2) / (α + 1)
Dispersion factor, Tweedie
Probability model distributions
Binomial distribution used
Typically use logit function
µ / (1 - µ) known as odds
Odds, logit function
µ / (1 - µ)
For each unit increase in a given predictor variable iwth coefficient ß increases the odds by eß - 1 in percentage terms
Offset term
Incorporates pre-determined values so GLM takes as given
Continuous predictor variables
If log link function is used, continuous variables should be logged (flexibility in fitting curve shapes)
Exceptions: year variables for trends, variables containing values of 0
Categorical predictor variables
Have 2 or more levels, converted to binary variables
Level with highest number of observations usually deemed base levels
Using a different level results in wider CIs