GLM Flashcards

1
Q

Model Specifications

A

Target Variable
Predictors
Link Function
Error Distribution
Weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why log continuous variables when using log link?

A

Otherwise positive coefficients will result in exponential effect on target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When to use weights

A

If rows in dataset represents an average for multiple data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When to use offsets

A
  1. Variable modeled elsewhere (territory model, deductible, limits)
  2. Want to change only some variables in rating plan
  3. Target variable varies directly with exposure (e.g. modeling claim counts)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlated Vars (How Identify and Adjust)

A

Identify with two-way correlation table

Can adjust with principal component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multicollinearity

A

When 2 or more predictors strongly predictive or a third

Two-way tables may not show

Identify with Variance Inflation Factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aliasing

A

Perfect correlation
Must remove one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Effects of highly correlated variables

A

Unstable Model (large std errs)
Unreasonable coefficients
Model may not converge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

GLM Limitations

A

Assign full credibility to data
Alt: GLMMs or Elastic Net GLMs

Assume randomness of outcomes uncorrelated
(Untrue if multiple years of data or cat risk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Freq/Sev Models vs PP

A

More stable

PP can overfit if variable only affects one or other

Tweedie dist assumes freq/sev move in same direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Target Variable Considerations

A

Split coverages, perils
Capping
Remove CATs and model separately
Trend/Develop
On-Level Premium

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Predictor Selection Criteria

A

Significance
Cost of collecting
IT constraints
Regulatory requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Partial Residual Plots

A

Detect non-linearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correcting Non-Linearity

A
  1. Binning
  2. Polynomial terms
  3. Piecewise
  4. Splines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Loglikelihood

A

How well model explains data

Con: Requires identical dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Deviance

A

How far model is from saturated model

Want unscaled to compare different models

Cons:
-Need to assume same error dist. in models
-Always decreases when add more params so overfits

17
Q

F-Test

A

Only for nested models

18
Q

AIC and BIC

A

Can compare any models

BIC over-penalizes large datasets

19
Q

Working Residual Plots

A

vs Linear Predictor (look for systemic over/under prediction)

vs One Predictor (look for non-linearity)

vs Weight (look for homoscedasticity)

20
Q

3 Ways to Assess Model Stability

A
  1. Cook’s Distance (influential records)
  2. Cross-Validation (consistent param estimates - not good for insurance when more manual intervention needed)
  3. Bootstrapping (consistent param estimates)
21
Q

Quantile Plot

A

How well model differentiates between best and worst risks

Quintiles by prediction. Then plot actual vs predicted

Want:
Actual close to predicted
Actual increasing monotonically
Large lift from actual endpoints

22
Q

Double Lift Chart

A

Compares two models directly

Quintiles by A/B. Then plot actual, A, B

Want model that’s closer to actual

Harder to interpret since compares where models disagree most

23
Q

Loss Ratio Chart

A

Whether model better at segmenting than current

Sort by Predicted LR. Plot actual LR

Want more spread

Only tells if good at segmenting not predicting

24
Q

Gini Index and Lorenz Curve

A

Plot cuml % of EE vs cuml % of Loss

Lorenze curve formed by points

Gini Index = 2 * area between curve and y=x

25
Q

ROC Curve

A

Used for logistic models

1-Specificity vs Sensitivity

AUROC higher better (no predictive power is y=x or .5)

26
Q

Specificity

A

True Neg/Total Negative

27
Q

Sensitivity

A

True Pos/Total Positive

28
Q

Why shouldn’t model ILFs or Deds

A

Policy Options chosen by insured
May give counterintuitive results
Never charge more for less coverage
Correlation with results but not causation

29
Q

Ensembling

A

Averaging models together
Improves performance when errors uncorrelated

30
Q

GLM Shortcomings

A
  1. Predictions must be based on linear function of predictors
  2. Instability if data thin or highly correlated vars
  3. Full credibility for each predictor coefficient
  4. Assumes randomness uncorrelated
  5. Dispersion param must be constant
31
Q

GLMM

A

Allows credibility in coeff estimates
Fixed and random effects

32
Q

DGLMs

A

Allows different dispersion parameters

33
Q

GAMs

A

Allows non-linearity without manual intervention

34
Q

MARS Models

A

Allows non-linearity without manual intervention

35
Q

Elastic Net GLMs

A

Allows credibility in coeff estimates
Automatic variable selection

36
Q

Why center continuous variables

A

Intercept more intuitive since meaningful base case
Makes signs of coefficients more intuitive

37
Q

Working Residual Advantages

A

Retain properties after binning (no pattern and homoscedastic)