A.2. GLM Flashcards

1
Q

GLM random component

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

GLM systematic component

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Advantages of multiplicative rating plans

A
  • Simple and practical to implement.
  • They guarantee positive premiums (not true for additive
    terms) .
  • Impact of risk characteristics is more intuitive.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variance functions for exponential family distributions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Choices for frequency distributions

A

Claim frequency is most often modeled using a Poisson
distribution. The GLM implementation of Poisson allows
for the distribution to be continuous instead of discrete.
Technically, the overdispersed Poisson is recommended,
which allows the dispersion parameter to be different than 1, and thus allows the
variance to be greater than the mean (instead of being equal
as with the typical Poisson).
Another choice for frequency modeling is the Negative
Binomial distribution, which is really just a Poisson
distribution with a parameter that itself has a Gamma
distribution. With the Negative Binomial, f is restricted to 1,
but instead it contains a dispersion parameter k in its variance
function that allows for the variance to exceed the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Choices for severity distributions

A

In insurance data, claim severity distributions tend to be
right-skewed and have a lower bound at 0. Both the Gamma
and Inverse Gaussian distributions exhibit these properties,
and as such are common choices for modeling severity. The
Gamma distribution is the most commonly used, but the
Inverse Gaussian has a sharper peak and wider tail, so it is
more appropriate for more skewed severity distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Relationship between Poisson, Gamma, and Tweedie parameters

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Logit and logistic functions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why continuous predictor variables should usually be logged and exceptions

A

Continuous variables should usually be logged when a log
link function is used to allow GLMs flexibility in fitting
different curve shapes to the data (other than just exponential
growth).
Exceptions to the general rule of logging a continuous
predictor variable exist such as using a year variable to pick
up trend effects. Also, if the variable contains values of 0, an
adjustment such as adding 1 to all observations must first be
made since ln(0) is undefined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Impact of choosing a level with fewer observations as the base level of a
categorical variable

A

This will still result in the same predicted relativities for that
variable (re-based to the chosen base level), but there will be
wider confidence intervals around the estimated coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Matrix form of a GLM

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Degrees of freedom for a model

A

The degrees of freedom of a model is the number of
parameters that need to be estimated for the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

GLM outputs for each predicted coeffi

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How number of observations and dispersion parameter impact p-values

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model Building Process (10 Steps)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a p value represent

A

An estimated probability that the absolute value of a particular b is at least that different from 0 by pure
chance

17
Q

Solutions for addressing correlation amongst variables (2)

A
  1. Remove all highly correlated variables except 1 - can cause loss of important signal
  2. Use dimensionality reduction techniques such as principal component analysis or factor analysis to create a new subset of variables from the correlated variables.
18
Q

What is aliasing and what happens to the model?

A

When 2 variables are perfectly correlated
The model does not converge

19
Q

2 GLM Limitations

A
  1. GLMS give full credibility to data
  2. GLMS assume randomness of outcomes are uncorrelated. (ie. same driver across multiple years)
20
Q

Explain cross-validation

A

Pick k number of folds, for each fold, train the other k-1 folds of data and test using the kth fold.

21
Q

Advantages of modelling frequency and severity separately.

A
  1. Gain more insight and intuition about the impact of each predictor
  2. F & S become more stable separately
  3. PP modelling can lead to overfitting if a variable only impacts F or S and not the other
  4. Tweedie model assumes both frequency and severity move in the same direction which may not be the case
22
Q
A