GLM1 Flashcards

Question 1

Q

2 main components of a Generalized Linear Model.

Answer

A

random component

systematic component

Question 2

Q

random component

Answer

A

each yi is assumed to be independent and to come from the exponential family of distributions with mean µi and variance Var(yi) = φV(µi)/ωi .

φ is called the dispersion parameter and is a constant used to scale the variance.

V(µ) is called the variance function and it describes the relationship between the variance and mean for a selected distribution type.

ωi are known as weights and assign a weight to each observation i.

Question 3

Q

systematic component

Answer

A

which is of the form g(µi) = β0 + β1xi1 + β2xi2 +···+ βpxip +offset

right hand side is known as the linear predictor

offset term is optional and allows you to manually specify the estimates for certain variables

x’s are the predictor variables

g(µ) is called the link function

β0 is called the intercept term, and the other β’s are called the coefﬁcients of the model. These are what we want to estimate. Once we know the β’s, we can plug in known values for the xi variables and calculate the predicted values of the yi variables (i.e., µi).

Question 4

Q

link function

Answer

A

allows for transformations of the linear predictor

For rating plans, the log link function g(µ) = ln(µ) is typically used since it transforms the linear predictor into a multiplicative structure.

Question 5

Q

3 advantages of using a log link function when building a pure premium model for use in creating a rating plan

Answer

A

A log link allows for a multiplicative rating plan, which has the following advantages:

Simple and practical to implement.
It guarantees positive premiums.
Impact of risk characteristics is more intuitive.

Question 6

Q

choice of the base level for the Gender variable

Answer

A

Since Female has more observations than Male, Female should be the base level for Gender.

Choosing a level with fewer observations as the base level will still result in the same predicted relativities for that variable (re-based to the chosen base level), but there will be wider conﬁdence intervals around the estimated coefﬁcients

Both the standard error and p-value will increase since a level with fewer observations is being made the new base level, and the model will have less conﬁdence in the base level estimate (in addition to the conﬁdence in the difference between the base level and Territory C).

Question 7

Q

g(µ) = β0 +β1ln(InsuredAge)+β2Male+β3TerrA+β4TerrC+β5Male∗TerrA+β6Male∗TerrC

explain the meaning of each of the β parameters in the model.

Answer

A

β0 is the intercept (base level - female in territory B at age 1)

β1 is the change in g(µ) for a unit change in the natural log of insured age

β2 is the change in g(µ) for being male instead of female

β3 is the change in g(µ) for being in territory A instead of B

β4 is the change in g(µ) for being in territory C instead of B

β5 is the additional interaction effect on g(µ) for being male and in territory A

β6 is the additional interaction effect on g(µ) for being male and in territory C

Question 8

Q

2 common choices for modeling claim severity

and why

Answer

A

Gamma and Inverse Gaussian distributions

Claim severity distributions tend to be right-skewed and have a lower bound at 0. Both Gamma and Inverse Gaussian distributions exhibit these properties.

Inverse Gaussian has a sharper peak and wider tail than Gamma (for the same mean and variance), so the Inverse Gaussian is more appropriate for severity distributions that are more skewed.

Question 9

Q

When creating a GLM with a log link function, it is generally recommended that continuous predictor variables

& 2 exceptions

Answer

A

be logged

Logging continuous variables allows for more ﬂexibility in ﬁtting different curve shapes to the data, since an unlogged variable will imply that the only relationship is exponential growth

2 exceptions: Using a time variable(e.g. year) to pick up time effects & when the variables contain values of 0 since ln(0) is undeﬁned

Question 10

Q

main beneﬁt of GLMs over univariate analysis

Answer

A

able to handle exposure correlation

GLMs also run into problems when predictor variables are very highly correlated.

Question 11

Q

what can happen in GLMs with highly correlated predictor variables

Answer

A

This can result in an unstable model with erratic coefﬁcients that have high standard errors.

Question 12

Q

two options for dealing with highly correlated predictor variables

Answer

A

i. Removing all highly correlated variables except one. This eliminates the high correlation in the model, but it also potentially loses some unique information contained in the eliminated variables.
ii. Use dimensionality-reduction techniques such as principal components analysis or factor analysis to create a new subset of variables from the correlated variables, and use this subset of variables in the GLM. The downside is the additional time required to do this extra analysis.

Question 13

Q

multicollinearity

Answer

A

Multicollinearity occurs when there is a near-perfect linear dependency among 3 or more predictor variables.

For example, suppose x1 + x2 ≈ x3.

When multicollinearity is present in a model, the model may become unstable with erratic coefﬁcients, and it may not converge to a solution

Question 14

Q

one way to detect multicollinearity in a model.

Answer

A

Use the variance inﬂation factor (VIF) statistic, which is given for each predictor variable

measures the impact on the squared standard error for that variable due to collinearity with other predictor variables by seeing how well other predictor variables can predict the variable in question

Question 15

Q

aliasing

Answer

A

When there is a perfect linear dependency among predictor variables, those variables are aliased

Question 16

Q

how GLM software can be used to correct for aliasing in a GLM.

Answer

Study These Flashcards

A

Most GLM software will detect aliasing and automatically remove one of the problematic variables from the model

Question 17

Q

2 limitations of GLMs

Answer

Study These Flashcards

A

i. GLMs give full credibility: The estimated coefﬁcients are not credibility-weighted to recognize low volumes of data or high volatility. This concern can be partially addressed by looking at p-values or standard errors.
ii. GLMs assume that the randomness of outcomes are uncorrelated

Question 18

Q

Two examples of violations of GLMs assume that the randomness of outcomes are uncorrelated

Answer

Study These Flashcards

A

Using a dataset with several renewals of the same policy, since the same insured over different renewals is likely to have correlated outcomes.
When the data can be affected by weather, the same weather events are likely to cause similar outcomes to risks in the same areas

Question 19

Q

considerations in merging policy and claim data for use in a GLM.

Answer

Study These Flashcards

A

Matching claims to speciﬁc vehicles/drivers (for auto) or speciﬁc coverages.
Checking for timing differences between datasets, such as when each dataset is updated. Timing differences can cause record matching problems.
Is there a unique key to merge the data (e.g., policy number)? There is the potential for orphaned claims if there is no matching policy record, or duplicating claims if there are multiple policy records.
What level should data be aggregated before merging? This needs to be considered along the time dimension (e.g., CY) and also the policy level versus claimant/coverage level. For commercial, location level or policy level?
Are there ﬁelds in the data not needed for the analysis that can be discarded? Are there ﬁelds desired that are not present that we want to try and obtain from a different data source?

GLM1 Flashcards

(19 cards)