GLM Flashcards

Question 1

Q

Model Specifications

Answer

A

Target Variable
Predictors
Link Function
Error Distribution
Weights

Question 2

Q

Why log continuous variables when using log link?

Answer

A

Otherwise positive coefficients will result in exponential effect on target

Question 3

Q

When to use weights

Answer

A

If rows in dataset represents an average for multiple data points

Question 4

Q

When to use offsets

Answer

A

Variable modeled elsewhere (territory model, deductible, limits)
Want to change only some variables in rating plan
Target variable varies directly with exposure (e.g. modeling claim counts)

Question 5

Q

Correlated Vars (How Identify and Adjust)

Answer

A

Identify with two-way correlation table

Can adjust with principal component analysis

Question 6

Q

Multicollinearity

Answer

A

When 2 or more predictors strongly predictive or a third

Two-way tables may not show

Identify with Variance Inflation Factor

Question 7

Q

Aliasing

Answer

A

Perfect correlation
Must remove one

Question 8

Q

Effects of highly correlated variables

Answer

A

Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge

Question 9

Q

GLM Limitations

Answer

A

Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs

Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)

Question 10

Q

Freq/Sev Models vs PP

Answer

A

More stable

PP can overfit if variable only affects one or other

Tweedie dist assumes freq/sev move in same direction

Question 11

Q

Target Variable Considerations

Answer

A

Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium

Question 12

Q

Predictor Selection Criteria

Answer

A

Significance
Cost of collecting
IT constraints
Regulatory requirements

Question 13

Q

Partial Residual Plots

Answer

A

Detect non-linearity

Question 14

Q

Correcting Non-Linearity

Answer

A

Binning
Polynomial terms
Piecewise
Splines

Question 15

Q

Loglikelihood

Answer

A

How well model explains data

Con: Requires identical dataset

Question 16

Q

Deviance

Answer

A

How far model is from saturated model

Want unscaled to compare different models

Cons:
-Need to assume same error dist. in models
-Always decreases when add more params so overfits

Question 17

Q

F-Test

Answer

A

Only for nested models

Question 18

Q

AIC and BIC

Answer

A

Can compare any models

BIC over-penalizes large datasets

Question 19

Q

Working Residual Plots

Answer

A

vs Linear Predictor (look for systemic over/under prediction)

vs One Predictor (look for non-linearity)

vs Weight (look for homoscedasticity)

Question 20

Q

3 Ways to Assess Model Stability

Answer

A

Cook’s Distance (influential records)
Cross-Validation (consistent param estimates - not good for insurance when more manual intervention needed)
Bootstrapping (consistent param estimates)

Question 21

Q

Quantile Plot

Answer

A

How well model differentiates between best and worst risks

Quintiles by prediction. Then plot actual vs predicted

Want:
Actual close to predicted
Actual increasing monotonically
Large lift from actual endpoints

Question 22

Q

Double Lift Chart

Answer

A

Compares two models directly

Quintiles by A/B. Then plot actual, A, B

Want model that’s closer to actual

Harder to interpret since compares where models disagree most

Question 23

Q

Loss Ratio Chart

Answer

A

Whether model better at segmenting than current

Sort by Predicted LR. Plot actual LR

Want more spread

Only tells if good at segmenting not predicting

Question 24

Q

Gini Index and Lorenz Curve

Answer

A

Plot cuml % of EE vs cuml % of Loss

Lorenze curve formed by points

Gini Index = 2 * area between curve and y=x

Question 25

Q

ROC Curve

Answer

A

Used for logistic models

1-Specificity vs Sensitivity

AUROC higher better (no predictive power is y=x or .5)

Question 26

Q

Specificity

Answer

A

True Neg/Total Negative

Question 27

Q

Sensitivity

Answer

A

True Pos/Total Positive

Question 28

Q

Why shouldn’t model ILFs or Deds

Answer

A

Policy Options chosen by insured
May give counterintuitive results
Never charge more for less coverage
Correlation with results but not causation

Question 29

Q

Ensembling

Answer

A

Averaging models together
Improves performance when errors uncorrelated

Question 30

Q

GLM Shortcomings

Answer

A

Predictions must be based on linear function of predictors
Instability if data thin or highly correlated vars
Full credibility for each predictor coefficient
Assumes randomness uncorrelated
Dispersion param must be constant

Question 31

Q

GLMM

Answer

A

Allows credibility in coeff estimates
Fixed and random effects

Question 32

Q

DGLMs

Answer

A

Allows different dispersion parameters

Question 33

Q

GAMs

Answer

A

Allows non-linearity without manual intervention

Question 34

Q

MARS Models

Answer

A

Allows non-linearity without manual intervention

Question 35

Q

Elastic Net GLMs

Answer

A

Allows credibility in coeff estimates
Automatic variable selection

Question 36

Q

Why center continuous variables

Answer

A

Intercept more intuitive since meaningful base case
Makes signs of coefficients more intuitive

Question 37

Q

Working Residual Advantages

Answer

A

Retain properties after binning (no pattern and homoscedastic)