A2. GLMs Flashcards

1
Q

Var(Y_i)

Formula

A

ΦV(μ_i) / ω_i
* Φ ~ Dispersion parameter
* V(μ_i) ~ Variance Function
* ω_i ~ Weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Variance Function V(μ)

Normal, Poisson, Gamma, Inverse Normal, Tweedie, Binomial + common uses

A

Normal: V(μ) = 1 (not normally distributed)
Poisson: V(μ) = μ (best for freq models given its discrete nature)
Gamma: V(μ) = μ^2 (severity models, right skewed/tailed)
Inverse Gaussian: V(μ) = μ^3 (severity model, better for more skewed)
Tweedie: V(μ) = μ^p (pure prem models, assumes freq/sev move in same direction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variance Function V(μ)

Binomial, Negative Binomial + common uses

A

Binomial: V(μ) = μ(1-μ) = npq (logistic regression models)
Negative Binomial = μ(1+Kμ) (frequency models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Deviance / F-Test

Formula for scaled and unscaled, F-Test formula, df, reject/accept Ho

A

Scaled D’ = 2(log-like saturated model - log-like model)
Unscaled D = ΦD’

F statistic = UNSCALED (Ds - Db) / # of added parameters * Φ_b

df1 (columns) = # of added parameters
df2 (rows) = n - p_b

Reject Ho (use bigger model) if F-stat > table value
Fail to reject Ho (keep smaller model) otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

AIC / BIC

Formula, which is more reasonable in insurnace

A

AIC = -2*log-like +2p

BIC = -2*log-like +pln(n)

AIC more reasonable since n gets very large for larger datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Offset Term

What is it, when do you add it to model, examples

A

Offset term allows you to incorporate pre-determined values for variables in your model (ex. deductible, policy term, etc)

Add offsets BEFORE running the GLM so that all estimated coefficients (B0, B1, etc) for other predictors are optimal in the presence of the offset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GLM Limitations

A

GLMs give full credibility
* The estimated coefficients are not credibility wieghted to recognize low volumes of data or high volatility. This can be partially addressed by looking at p-values and standard errors

GLMs assume that the randomness of outcomes are uncorrelated, which may not be true in practice
* Weather events can cause similar outcomes to risks in same area
* Using a dataset with several renewals, the same insured will have correlated/similar outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlation Among Predictor Variables

What happens to the model? How to Check? Solutions?

A

What could happen:
* Model may not converge
* Unstable model, unstable coefficients w/ high standard errors

How to Check
* Variance Inflation Factor (VIF) > 10

Solutions
* Remove all highly correlated variables except 1
* Use principle components analysis or factor analysis to create a new subset of variables to use in the GLM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Key Stakeholders of Predictive Modeling Project

A
  1. Regulators - need to check if variables are legal to use and this varies by state
  2. IT - consider IT limitations of project and the cost of programming changes
  3. Agents/UWs - these people sell the insurance, it is important for them to understand the new rating structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Deductible or Limits in GLMs?

Yes or no, explain

A

Coverage related variables in GLMs may produce unintuitive results such as lower rates for lower deductibles. This could be due to correlations with other variables outside the model.

Instead, should use a LER analysis or ILF analysis and incorporate deductible/limits as an offset term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model Stability

What is it? How to check?

A

Stable model is not very sensitive to changes in the modeling data (add/remove a large loss)

Ways to Measure:
* The influence of an individual record can be measured using Cook’s distance. Records with high Cook’s distance should be given extra thought as to whether it should be included in the dataset or not
* Cross Validation - compare parameter estimates across different model runs
* Bootstrapping - used to create new datasets from the original dataset by randomly sampling with replacement. Compare parameter estimates across different runs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ROC Curve / Evaluation of Model

Sensitivity / Specificity / Discrimination Threshold / How to Plot

A

Sensitivity = True Positives / Total Actual Positives
Specificity = True Negatives / Total Actual Negatives

Discrimination Threshold = x
If predicted prob ≥ x → assign True otherwise False

Plotting ROC Curve:
* x-axis: 1 - specificity
* y-axis: sensitivity
* line of equality (0%,0%)→(100%,100%)

AUROC (area under ROC) higher the better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pro/Con of Modeling Frequency & Severity Separately

A

Advantage
* More insight and intuition about the impact of each predictor variable
* Tweedie distribution (most common distribution for modeling pure premium) assumes both frequency and severity move in the same direction, but this is often unrealistic
* Modeling pure premium can lead to overfitting if a predictor variable only impacts frequency or severity but not both
* Each of frequency and severity separately is more stable

Disadvantage
* Takes more time
* Claim level data may not be available to model frequency and severity separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly