GLMs Flashcards

1
Q

Advantages to GLMs

A

Adjusts for correlation, with less restrictive assumptions to classical linear models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Disadvantages to GLMs

A

Often difficult to explain results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Components of GLMs

A

Random component

Systematic component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random component of GLM

A

Each yi assumed to be independent and come from exponential family of distributions

Mean = µi

Variance = [ØV(µi)]/wi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variance within a GLM, formula

A

Variance = [ØV(µi)]/wi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ø

A

Dispersion/Scale factor

Same for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

V(µ)

A

Variance function, given for selected distribution type

Describes relationship between variance and mean

Same distribution type must be used for all observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Systematic component of GLM

A

g(µi) = ß0 + ß1xi1 +… + ßpxip + offset

Allows you to manually specify estimates of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Link functions, g(x)

A

Allows for transformations of linear predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Link function often used for binary target variables

A

Logit link:

g(µ) = ln [µ / (1 - µ)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Link function used in rating plans

A

g(µ) = ln(µ)

Allows transformation of linear predictor into multiplicative structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Advantages of multiplicative rating plans

A

Simple/practical to implement

Guarantees positive premiums

Impact of risk characteristics more intuitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance functions for exponential families

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use weights in a GLM

A

When a single observation contains grouped information

Different observations represent different time periods

(If neither apply, weights are all 1)

Weights are usually the denominator of the modeled quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Severity model distributions

A

Tend to be right-skewed

Lower bound at zero

Gamma and inverse Gaussian distributions exhibit these properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Gamma vs. inverse Gaussian distributions

A

Gamma most commonly used for severity

Inverse Gaussian has sharper peak, wider tail (more appropriate for more skewed distributions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Frequency model distributions

A

Poisson (technically, ODP) - most common, Ø can be > 1

Negative binomial - Ø = 1 but there is a K in variance function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pure premium distributions

A

Large point mass at zero (most policies have no claims)

Right-skewness due to severity distribution

Tweedie distribution most commonly used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Tweedie distribution

A

p = “Power parameter”

1 < p < 2: Poisson frequency and Gamma severity

Assumes frequency and severity move in same direction (not realistic)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean, Tweedie

A

Poisson mean x Gamma mean

= λ x αθ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Variance, Tweedie

A

صp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

p, Tweedie

A

p = (α + 2) / (α + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Dispersion factor, Tweedie

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Probability model distributions

A

Binomial distribution used

Typically use logit function

µ / (1 - µ) known as odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Odds, logit function

A

µ / (1 - µ)

For each unit increase in a given predictor variable iwth coefficient ß increases the odds by eß - 1 in percentage terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Offset term

A

Incorporates pre-determined values so GLM takes as given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Continuous predictor variables

A

If log link function is used, continuous variables should be logged (flexibility in fitting curve shapes)

Exceptions: year variables for trends, variables containing values of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Categorical predictor variables

A

Have 2 or more levels, converted to binary variables

Level with highest number of observations usually deemed base levels

Using a different level results in wider CIs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Matrix notation of GLM

A

g(µ) =

µ is the vector of µi values

ß is the vector of ß parameters (coefficients)
X is the design matrix (rows are results ignoring coefficients)

30
Q

Degrees of freedom

A

Number of parameters that need to be estimated for the model

31
Q

If a variable is not significant in a GLM

A

Should be removed, grouped with the base level

32
Q

Options for dealing with very high correlation in GLMs

A
33
Q

Multicollinearity

A

Near-perfect linear dependency among 3 or more predictor variables

Ex: x1 + x2 ~ x3

Detected with variance inflation factor statistic (VIF) > 10 is considered high

34
Q

Aliased variables

A

Perfect linear dependency among predictor variables

GLM will not converge, but most will detect and remove one of the variables from the model

35
Q

GLM limitations

A
  1. GLMs give full credibility (partially adressed by p-values, SEs)
  2. GLMs assume randomness of outcomes are uncorrelated (violated if dataset has several renewals of same policy, or by weather events)
36
Q

Model-building process

A
  1. Setting goals and objectives
  2. Communication (IT, legal, UWs)
  3. Collecting/processing data
  4. Exploratory data analysis
  5. Specifying the form of the model
  6. Evaluating model output
  7. Validation
  8. Translation into a product
  9. Maintenance and rebuild
37
Q

Splitting data for testing

A

Training set and test (holdout) set

38
Q

Model testing strategies

A

Train and test

Train, validate, test

Cross-validation

39
Q

Train and test

A

Split into single training and single test sets (60/40 or 70/30)

Can split randomly or on time (if not done by time, could lead to over-optimistic validation results)

40
Q

Train, validate, and test

A

Validation set can be used to refine model and make tweaks

Test set should be left until model is final

Typically 40/30/30

41
Q

Cross-validation

A

Less common in insurance (hand-picked variables)

Most common is k-fold:
1. Pick a k and split into k folds

  1. For each fold, train the model using the other k - 1 folds, and test using kth fold

Superior (more data for training and testing) but more time-consuming (models built completely separately)

42
Q

When to use test dataset

A

Only when model is complete

If too many decisions are made based on test set, it is effectively a training set (leads to overfitting)

43
Q

Advantages of modeling freq/sev over PP

A

Gain more insight and intuition about each

Each is more stable separately

PP modeling can lead to overfitting if predictor variable only impacts freq or sev but not both, since randomness of other component may be considered signal

Tweedie assumes both freq and sev move in same direction

44
Q

Handling perils in a GLM

A
  1. Run each peril model separately
  2. Aggregate expected losses
  3. Run model using all-peril LC as target variable and union of all predictor variables as predictors (focus on dataset more reflective of future mix of business)
45
Q

Criteria for variable inclusion in GLM

A

p-values

Cost-effectiveness of collecting data

ASOPs/legal requirements

IT constraints

46
Q

Partial residuals for predictor variables

A

Plot r against x and see if the points match y = ßx

47
Q

Transformations if residuals do not match line

A

Binning (increases DOF, variation within bins is ignored)

Adding polynomial terms (loses interpretability without a grap)

Add piecewise linear function: hinge function max (0, x - c) at each break point c

48
Q

Interaction term combinations

A

Two categorical: 1/0

Continuous and categorical: f(x)/0

Two continuous: product of the two

49
Q

Log-likelihood

A

Log of the product of the likelihood for all observations using the model

50
Q

Deviance

A
51
Q

Comparing models using LL and deviance

A

Only valid if datasets used are identical

Comparisons of deviance only valid if assumed distribution and dispersion are the same

52
Q

Nested models (F-test)

A
53
Q

Akaike Information Criterion (AIC)

A

AIC = -2LL + 2p

54
Q

Bayesian Information Criterion (BIC)

A

BIC = -2LL + p ln (n)

Less reasonable for insurance data (large n, large BIC)

55
Q

Q-Q plot

A

Sort deviance residuals in ascending order (y-axis)

Ø-1 [(i - 0.5) / n] for x-coordinates

If model is well-fit, points will appear on a straight line

56
Q

Model stability measures

A

Cook’s distance

Cross-validation

Bootstrapping

57
Q

Model selection methods

A

Plotting actual vs. predicted

Simple quantile plots

Double lift charts

Loss ratio charts

Gini Index

58
Q

Lift

A

Economic value of the model (ability to prevent adverse selection)

59
Q

Creating simple quantile plot

A

Sort holdout datased based on predicted LC

Bucket into quantiles by exposure

Calculate average predicted LC and average actual LC from each bucked and plot (divide both values by overall average predicted LC)

60
Q

Winning model, simple quantile plots

A
  1. Predictive accuracy
  2. Monotonicity
  3. Vertical distance of actual LC between first and last quantiles
61
Q

Double lift chart

A
  1. Calculate sort ratio
  2. Sort by sort ratio
  3. Bucket into quantiles by exposure
  4. Calculate average predicted LC for each model and average actual LC for each bucket, divide by overall average LC
62
Q

Loss ratio chart

A
  1. Sort holdout dataset based on predicted LC
  2. Bucket into quantiles by exposure
  3. Calculate actual loss ratio (using current rating plan)

Greater distance between lowest and highest, greater model does identifying further segmentation opportunites

63
Q

Gini Index

A
  1. Sort holdout dataset based on predicted LC
  2. Plot the graph with x-axis as % cumulative exposures, y-axis as cumulative % of actual losses
64
Q

True positive

A

Correct prediction that the event occurs

65
Q

False positive

A

Prediction that the event occurs, but it does not

66
Q

False negative

A

Prediction that the event does not occur, but it does

67
Q

True negative

A

Correct prediction that the event does not occur

68
Q

Sensitivity of a model

A

Ratio of true positives to total event occurrences

Sometimes called true positive or hit rate

69
Q

Specificity

A

Ratio of true negatives to total event non-occurrences

70
Q

Receiver Operating Characteristic (ROC) Curve

A

Plots sensitivity as a function of 1 - specificity

71
Q

Ensemble models

A

If two or more models perform roughly equally well, can combine those models; only really works when model errors are as uncorrelated as possible