GLMs Flashcards

1
Q

Advantages to GLMs

A

Adjusts for correlation, with less restrictive assumptions to classical linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Disadvantages to GLMs

A

Often difficult to explain results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Components of GLMs

A

Random component

Systematic component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random component of GLM

A

Each yi assumed to be independent and come from exponential family of distributions

Mean = µi

Variance = [ØV(µi)]/wi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variance within a GLM, formula

A

Variance = [ØV(µi)]/wi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ø

A

Dispersion/Scale factor

Same for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

V(µ)

A

Variance function, given for selected distribution type

Describes relationship between variance and mean

Same distribution type must be used for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Systematic component of GLM

A

g(µi) = ß0 + ß1xi1 +… + ßpxip + offset

Allows you to manually specify estimates of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Link functions, g(x)

A

Allows for transformations of linear predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Link function often used for binary target variables

A

Logit link:

g(µ) = ln [µ / (1 - µ)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Link function used in rating plans

A

g(µ) = ln(µ)

Allows transformation of linear predictor into multiplicative structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of multiplicative rating plans

A

Simple/practical to implement

Guarantees positive premiums

Impact of risk characteristics more intuitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance functions for exponential families

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use weights in a GLM

A

When a single observation contains grouped information

Different observations represent different time periods

(If neither apply, weights are all 1)

Weights are usually the denominator of the modeled quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Severity model distributions

A

Tend to be right-skewed

Lower bound at zero

Gamma and inverse Gaussian distributions exhibit these properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Gamma vs. inverse Gaussian distributions

A

Gamma most commonly used for severity

Inverse Gaussian has sharper peak, wider tail (more appropriate for more skewed distributions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Frequency model distributions

A

Poisson (technically, ODP) - most common, Ø can be > 1

Negative binomial - Ø = 1 but there is a K in variance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pure premium distributions

A

Large point mass at zero (most policies have no claims)

Right-skewness due to severity distribution

Tweedie distribution most commonly used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Tweedie distribution

A

p = “Power parameter”

1 < p < 2: Poisson frequency and Gamma severity

Assumes frequency and severity move in same direction (not realistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean, Tweedie

A

Poisson mean x Gamma mean

= λ x αθ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variance, Tweedie

A

صp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

p, Tweedie

A

p = (α + 2) / (α + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Dispersion factor, Tweedie

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Probability model distributions

A

Binomial distribution used

Typically use logit function

µ / (1 - µ) known as odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Odds, logit function
µ / (1 - µ) For each unit increase in a given predictor variable iwth coefficient ß increases the odds by eß - 1 in percentage terms
26
Offset term
Incorporates pre-determined values so GLM takes as given
27
Continuous predictor variables
If log link function is used, continuous variables should be logged (flexibility in fitting curve shapes) Exceptions: year variables for trends, variables containing values of 0
28
Categorical predictor variables
Have 2 or more levels, converted to binary variables Level with highest number of observations usually deemed base levels Using a different level results in wider CIs
29
Matrix notation of GLM
g(**µ**) = **Xß** **µ** is the vector of µi values **ß** is the vector of ß parameters (coefficients) X is the design matrix (rows are results ignoring coefficients)
30
Degrees of freedom
Number of parameters that need to be estimated for the model
31
If a variable is not significant in a GLM
Should be removed, grouped with the base level
32
Options for dealing with very high correlation in GLMs
33
Multicollinearity
Near-perfect linear dependency among 3 or more predictor variables Ex: x1 + x2 ~ x3 Detected with variance inflation factor statistic (VIF) \> 10 is considered high
34
Aliased variables
Perfect linear dependency among predictor variables GLM will not converge, but most will detect and remove one of the variables from the model
35
GLM limitations
1. GLMs give full credibility (partially adressed by p-values, SEs) 2. GLMs assume randomness of outcomes are uncorrelated (violated if dataset has several renewals of same policy, or by weather events)
36
Model-building process
1. Setting goals and objectives 2. Communication (IT, legal, UWs) 3. Collecting/processing data 4. Exploratory data analysis 5. Specifying the form of the model 6. Evaluating model output 7. Validation 8. Translation into a product 9. Maintenance and rebuild
37
Splitting data for testing
Training set and test (holdout) set
38
Model testing strategies
Train and test Train, validate, test Cross-validation
39
Train and test
Split into single training and single test sets (60/40 or 70/30) Can split randomly or on time (if not done by time, could lead to over-optimistic validation results)
40
Train, validate, and test
Validation set can be used to refine model and make tweaks Test set should be left until model is final Typically 40/30/30
41
Cross-validation
Less common in insurance (hand-picked variables) Most common is *k*-fold: 1. Pick a *k* and split into *k* folds 2. For each fold, train the model using the other *k* - 1 folds, and test using *k*th fold Superior (more data for training and testing) but more time-consuming (models built completely separately)
42
When to use test dataset
Only when model is complete If too many decisions are made based on test set, it is effectively a training set (leads to overfitting)
43
Advantages of modeling freq/sev over PP
Gain more insight and intuition about each Each is more stable separately PP modeling can lead to overfitting if predictor variable only impacts freq or sev but not both, since randomness of other component may be considered signal Tweedie assumes both freq and sev move in same direction
44
Handling perils in a GLM
1. Run each peril model separately 2. Aggregate expected losses 3. Run model using all-peril LC as target variable and union of all predictor variables as predictors (focus on dataset more reflective of future mix of business)
45
Criteria for variable inclusion in GLM
p-values Cost-effectiveness of collecting data ASOPs/legal requirements IT constraints
46
Partial residuals for predictor variables
Plot r against x and see if the points match y = ßx
47
Transformations if residuals do not match line
Binning (increases DOF, variation within bins is ignored) Adding polynomial terms (loses interpretability without a grap) Add piecewise linear function: hinge function max (0, x - c) at each break point c
48
Interaction term combinations
Two categorical: 1/0 Continuous and categorical: f(x)/0 Two continuous: product of the two
49
Log-likelihood
Log of the product of the likelihood for all observations using the model
50
Deviance
51
Comparing models using LL and deviance
Only valid if datasets used are identical Comparisons of deviance only valid if assumed distribution and dispersion are the same
52
Nested models (F-test)
53
Akaike Information Criterion (AIC)
AIC = -2LL + 2p
54
Bayesian Information Criterion (BIC)
BIC = -2LL + p ln (n) Less reasonable for insurance data (large n, large BIC)
55
Q-Q plot
Sort deviance residuals in ascending order (y-axis) Ø-1 [(i - 0.5) / n] for x-coordinates If model is well-fit, points will appear on a straight line
56
Model stability measures
Cook's distance Cross-validation Bootstrapping
57
Model selection methods
Plotting actual vs. predicted Simple quantile plots Double lift charts Loss ratio charts Gini Index
58
Lift
Economic value of the model (ability to prevent adverse selection)
59
Creating simple quantile plot
Sort holdout datased based on predicted LC Bucket into quantiles by exposure Calculate average predicted LC and average actual LC from each bucked and plot (divide both values by overall average predicted LC)
60
Winning model, simple quantile plots
1. Predictive accuracy 2. Monotonicity 3. Vertical distance of actual LC between first and last quantiles
61
Double lift chart
1. Calculate sort ratio 2. Sort by sort ratio 3. Bucket into quantiles by exposure 4. Calculate average predicted LC for each model and average actual LC for each bucket, divide by overall average LC
62
Loss ratio chart
1. Sort holdout dataset based on predicted LC 2. Bucket into quantiles by exposure 3. Calculate actual loss ratio (using current rating plan) Greater distance between lowest and highest, greater model does identifying further segmentation opportunites
63
Gini Index
1. Sort holdout dataset based on predicted LC 2. Plot the graph with x-axis as % cumulative exposures, y-axis as cumulative % of actual losses
64
True positive
Correct prediction that the event occurs
65
False positive
Prediction that the event occurs, but it does not
66
False negative
Prediction that the event does not occur, but it does
67
True negative
Correct prediction that the event does not occur
68
Sensitivity of a model
Ratio of true positives to total event occurrences Sometimes called true positive or hit rate
69
Specificity
Ratio of true negatives to total event non-occurrences
70
Receiver Operating Characteristic (ROC) Curve
Plots sensitivity as a function of 1 - specificity
71
Ensemble models
If two or more models perform roughly equally well, can combine those models; only really works when model errors are as uncorrelated as possible