A2. GLMs Flashcards

Question 1

Q

Var(Y_i)

Formula

Answer

A

ΦV(μ_i) / ω_i
* Φ ~ Dispersion parameter
* V(μ_i) ~ Variance Function
* ω_i ~ Weights

Question 2

Q

Variance Function V(μ)

Normal, Poisson, Gamma, Inverse Normal, Tweedie, Binomial + common uses

Answer

A

Normal: V(μ) = 1 (not normally distributed)
Poisson: V(μ) = μ (best for freq models given its discrete nature)
Gamma: V(μ) = μ^2 (severity models, right skewed/tailed)
Inverse Gaussian: V(μ) = μ^3 (severity model, better for more skewed)
Tweedie: V(μ) = μ^p (pure prem models, assumes freq/sev move in same direction)

Question 3

Q

Variance Function V(μ)

Binomial, Negative Binomial + common uses

Answer

A

Binomial: V(μ) = μ(1-μ) = npq (logistic regression models)
Negative Binomial = μ(1+Kμ) (frequency models)

Question 4

Q

Deviance / F-Test

Formula for scaled and unscaled, F-Test formula, df, reject/accept Ho

Answer

A

Scaled D’ = 2(log-like saturated model - log-like model)
Unscaled D = ΦD’

F statistic = UNSCALED (Ds - Db) / # of added parameters * Φ_b

df1 (columns) = # of added parameters
df2 (rows) = n - p_b

Reject Ho (use bigger model) if F-stat > table value
Fail to reject Ho (keep smaller model) otherwise

Question 5

Q

AIC / BIC

Formula, which is more reasonable in insurnace

Answer

A

AIC = -2*log-like +2p

BIC = -2*log-like +pln(n)

AIC more reasonable since n gets very large for larger datasets

Question 6

Q

Offset Term

What is it, when do you add it to model, examples

Answer

A

Offset term allows you to incorporate pre-determined values for variables in your model (ex. deductible, policy term, etc)

Add offsets BEFORE running the GLM so that all estimated coefficients (B0, B1, etc) for other predictors are optimal in the presence of the offset

Question 7

Q

GLM Limitations

Answer

A

GLMs give full credibility
* The estimated coefficients are not credibility wieghted to recognize low volumes of data or high volatility. This can be partially addressed by looking at p-values and standard errors

GLMs assume that the randomness of outcomes are uncorrelated, which may not be true in practice
* Weather events can cause similar outcomes to risks in same area
* Using a dataset with several renewals, the same insured will have correlated/similar outcomes

Question 8

Q

Correlation Among Predictor Variables

What happens to the model? How to Check? Solutions?

Answer

A

What could happen:
* Model may not converge
* Unstable model, unstable coefficients w/ high standard errors

How to Check
* Variance Inflation Factor (VIF) > 10

Solutions
* Remove all highly correlated variables except 1
* Use principle components analysis or factor analysis to create a new subset of variables to use in the GLM

Question 9

Q

Key Stakeholders of Predictive Modeling Project

Answer

A

Regulators - need to check if variables are legal to use and this varies by state
IT - consider IT limitations of project and the cost of programming changes
Agents/UWs - these people sell the insurance, it is important for them to understand the new rating structure

Question 10

Q

Deductible or Limits in GLMs?

Yes or no, explain

Answer

A

Coverage related variables in GLMs may produce unintuitive results such as lower rates for lower deductibles. This could be due to correlations with other variables outside the model.

Instead, should use a LER analysis or ILF analysis and incorporate deductible/limits as an offset term

Question 11

Q

Model Stability

What is it? How to check?

Answer

A

Stable model is not very sensitive to changes in the modeling data (add/remove a large loss)

Ways to Measure:
* The influence of an individual record can be measured using Cook’s distance. Records with high Cook’s distance should be given extra thought as to whether it should be included in the dataset or not
* Cross Validation - compare parameter estimates across different model runs
* Bootstrapping - used to create new datasets from the original dataset by randomly sampling with replacement. Compare parameter estimates across different runs

Question 12

Q

ROC Curve / Evaluation of Model

Sensitivity / Specificity / Discrimination Threshold / How to Plot

Answer

A

Sensitivity = True Positives / Total Actual Positives
Specificity = True Negatives / Total Actual Negatives

Discrimination Threshold = x
If predicted prob ≥ x → assign True otherwise False

Plotting ROC Curve:
* x-axis: 1 - specificity
* y-axis: sensitivity
* line of equality (0%,0%)→(100%,100%)

AUROC (area under ROC) higher the better

Question 13

Q

Pro/Con of Modeling Frequency & Severity Separately

Answer

A

Advantage
* More insight and intuition about the impact of each predictor variable
* Tweedie distribution (most common distribution for modeling pure premium) assumes both frequency and severity move in the same direction, but this is often unrealistic
* Modeling pure premium can lead to overfitting if a predictor variable only impacts frequency or severity but not both
* Each of frequency and severity separately is more stable

Disadvantage
* Takes more time
* Claim level data may not be available to model frequency and severity separately