Anderson: A practitioner's guide to GLM Flashcards
Failings of one way analysis and examples
- can be distorted by correlation between rating variables
e. g. younger drivers drive old cars more often. Worse LR in old cars can be partially driven by younger drivers. - do not consider interdependencies between rating variables
e. g. rate differentials between male and female drivers vary by age
Failings of minimum biased procedure
- provide no credible range for parameter estimations
- GLM can provide confidence intervals - lack of a statistical framework to allow assessment of quality of modelling work
Limitation of classical linear model
- response variable may not be normal and have constant variance
- values of response variable may be restricted to be positive. Normal assumption violates this restriction.
- Additive assumptions are not realistic for insurance application (risk tend to vary multiplicatively with rating factors)
How the choice of error function affects the results of GLM (Normal, vs. Poisson, vs. Gamma)
- GLM with normal variance function
- attracted to data points with equal weight - GLM with Poisson variance function
- assumes variance increases with expected value
- observation with smaller expected value => smaller variance => more credibility
- Fitted values are more influenced by observations on the left (smaller ones) - GLM with Gamma variance function
- even more strongly influenced by pts on the left (variance increase with x^2)
Typical model for
- count/freq
- severity
- retention&new business
- Multiplicative Poisson
- claim freq: prior weight is exposure
- claim count: offset term is log(exposure)
- invariant to measure of time - multiplicative gamma
- invariant to measure of currency - logit link & binominal error term
- invariant to measure of success/failure
What is Aliasing?
What are the two types of Aliasing and give example for each.
Aliasing: when there is dependency among covariates in the model.
Two types of Aliasing:
- Intrinsic aliasing: dependency among variants due to definition
e. g we have black, red and gold cars. gold = total - black - red - extrinsic aliasing: dependency results from nature of the data
e.g. when one level of a particular factor is perfectly correlated with a level of another factor
when we have black, red and gold cars and all red cars are sports car. # of sports cars = # of red cars
Once the link function and error structure have been selected, describe the process to determine the final beta parameters.
- use the probability distribution function of the error distribution, set up likelihood equations given the observations.
- take log(likelihood equations), differentiate w.r.t parameter betas. Set the partial derivatives to zero and solve for beta.
Comparison between linear model and GLM for: random component systematic component link function error term
Random component
- Linear Model:
[] independent and normally distributed
[] their means may differ but they have common variance
- GLM:
[] independent and from exponential family
Systematic component:
Linear Model & GLM are the same:
covariates are combined to produce a linear predictor
Link function:
- Linear: Identity E[Y] = u = x*beta
- GLM: link function is differentiable and monotonic E[Y] = u = g-1(x*beta)
Error term:
- Linear Model: Normally distributed
- GLM: various distribution possible
What is a holdout sample?
Purpose of holdout sample?
A training dataset is used to fit a model.
A holdout sample is a separate sample of data not used to fit the model.
Purpose: compare prediction from training model with prediction from holdout sample to test:
- how well the model generalize to other data
- to prevent over fitting
Selection criteria for holdout sample
- unbiased sample from the same population as the training data
- large enough to be credible
How to solve aliasing?
- deleting/excluding rogue records or
- reclassifying the rogue records into another more appropriate, factor level