Chapter 18: GLM Flashcards

Question 1

Q

List assumptions of the classical linear model

Answer

A

response variable modelled as a linear combination of explanatory variables
error terms have normal distribution
error terms have constant variance
error terms are independent

Question 2

Q

Describe drawbacksof the classical linear model

Answer

A

model assumes a normal distribution which has constant variance, may not be appropriate
adds together the effects of different explanatory variables, but this is often not reality
may become long-winded with more than 2 explanatory variables

Question 3

Q

When doesintrinsic and extrinsic aliasing occur?

Answer

A

Intrinsic aliasing occurs:
because of dependencies inherent in the definition of the explanatory variables
this is dealt with by modelling software

Extrinsic aliasing occurs:
when two or more explanatory variables contain levels that are perfectly correlated
“Near aliasing” occurs when this correlation is almost, but not quite perfect

Question 4

Q

Why use a GLM over one-way analysis?

Answer

A

one-way analysis ignores correlation and interaction effects:

for example, the effect of smoker status on claim amount amount may be higher for males than females
for example, the effect of smoker status on claim amount amount may be higher for older ages compared to younger ages
as a result, the one-way analysis may underestimate the effect of smoker status on claim amount when considering older ages

glm appropriately accounts for correlations and interactions:

by simultaneously modelling the effects of explanatory variables on the response variable

Question 5

Q

Why use a GLM over classical linear model?

Answer

A

model not limited normal distribution:
can take on any distribution from the exponential family, for example poisson/ gamma

model not limited to the additive effects of explanatory variables:
can model the multiplicative effects of explanatory variables through use of a link function (transforming them to linearity)

variance of the response variable is a function of its mean and can often increase with the value of its mean:
for example poisson

Question 6

Q

Define thetotal devianceand thescaled deviance
total deviance

Answer

A

total deviance:

deviance is a measure of the distance bweteen the observed value (Y_i) to the fitted value (u_i)
with allowance for weights w_i - with higher importance assigned to errors where the variance should be small
the sum of each observation’s contribution to the deviance (d(Y_i,u_i)) is the total deviance for a model
D, total deviance = SUM (from i to n) of d(Y_i,u_i)

scaled deviance:

total deviance adjusted by the scale parameter phi
D*, scaled deviance = D/phi
thisstandardises the deviance so that it can be used when comparing different models

Question 7

Q

List 3 goodness of fit tests

Answer

A

chi-squared statistic:
used when comparing nested models and where the scale parameter is known
test statistic = (D_1)* - (D_2)*
which has thechi-squared distribution with degrees of freedom = df_1 - df_2
degrees of freedom is the number of observations less number of parameters

F-statistic:
used when comparing nested models and where the scale parameter is unknown
test statistic = [D_1 - D_2] / [(df_1 - df_2)*(D_2/df_2)]
which has theF-distribution with degrees of freedom = df_1 - df_2 ; df_2

AIC:
can be used when models are not necessarily nested
AIC = -2 * log-likelihood +2 * number of parameters
lower the AIC, the better the model

Question 8

Q

Explain 5 ways we can test for appropriateness of models

Answer

A

hat matrix:
H such that y_hat = H * y
The diagonal entries are called leverages that measure the influence the observed value has on their respective fitted value

deviance residuals:
which measures the distance between the observed and fitted values. Any large deviations may indicate that distributional assumptions are being violated

standardised pearson residuals:
which measures the distance between the observed value and fitted value, adjusted for the leverage from the observed value and variance of the fitted value

Cook’s distance:
alternative to the diagonal entries of the hat matrix where Cook’s distance > 1 may be cause for concern

residual plot:
where residuals are plotted against the fitted values. Residuals should be symmetrical about the x-axis and should have an average residual of zero

Question 9

Q

List examples of GLMs

Answer

A

gamma model:
may be a good model for claim amounts, log link

poisson model:
may be a good model for claim frequency, log link

logistic regression model:
may be a good model for binary outcome, logit link
consider odds ratios with p-value to assess significance

Question 10

Q

What can a GLM be used to model?

Answer

A

cost plpm - cost per life per month

Question 11

Q

List key items to mention when suggesting a model

Answer

A

specify explanatory variables, response variable
specify model, link function
consider interactions
consider the significance of coefficients (p-value, 95% ci)

Question 12

Q

Outline the advantages of the Tweedie distribution for modelling PMI claims`

Answer

A

The Tweedie distribution is a special member of the exponential family

that has a
point mass (large spike) at zero

and corresponds to the compound distribution of a Poisson claim number process and a gamma claim size distribution.

Question 13

Q

List two properties of the exponential family of distributions

Answer

A

Distribution completely specified in terms of its mean and variance

Variance of the response is a function of its mean

Chapter 18: GLM Flashcards

(13 cards)