Chapter 18 - GLM Flashcards
Explanatory variables
- inputs into the model that are expected to influence the response variable
- choice of explanatory variables depends on the purpose of the model
Response variables
- outputs from the model that are likely to be affected by explanatory variables
Categorical variables
- explanatory variables
- aka factors
- values of each level or distinct
- often cannot be given natural ordering or score
- continuous numerical variables (e.g. age) are often categorical
Non-categorical variables
- can take numerical values
Interaction terms
- included where pattern of response variable is better modelled by including parameters for each combination of two or more factors
What does a GLM do?
A GLM unpicks relationships and produces estimates of the true values of the relativities. It does this by taking account of correlations and allowing for investigation of any interactions between variables in the model
Assumptions of classic linear model
- the error terms are independent and come from a normal distribution
- the mean is a linear combination of the explanatory variables
- the error terms have constant variance
Can estimate the parameters B0, B1, B2 using method of maximum likelihood
pg.635
Drawbacks of the normal model for multiple linear regression
- assumes that the response variable has a normal distribution which may not be appropriate for the variable being modelled
- the normal distribution has a constant variance which may not be appropriate for the variable being modelled
- adds together the effects of different explanatory variables, but this is seldom what is observed in practice
- with more than two explanatory variables, a manual solution becomes increasingly long-winded
How do GLMs address these problems?
- the response variable can take any distribution from the exponential family
- a link function is introduced which acts to remove the assumption that the effects of different variables must simply be added together
- allow an offset term to be included within the linear predictor
GLM form
Pg. 639
Properties of members of the exponential family
- the distribution is completely specified in terms of its mean and variance
- the variance of Yi is a function of its mean
Requirements for link function
- differentiable
- monotonic
Obtaining the predicted values from a simple GLM
- specify the design matrix X and the vector of parameters B
- choose the distribution for the response variable and the link function
- identify log-likelihood function
- take the log to convert product into sum
- maximise the log of the likelihood function by taking partial derivatives with respect to each parameter, setting them to zero and solving the result of the system of equations
- compute the predicted values
Degrees of freedom
number of observations less the number of parameters
Deviance formula
Compares observed value Y to fitted value u, with allowance for weights
pg.649
Nested models
Two models are nested if one model contains explanatory variables that are a subset of the explanatory variables in the other model.
How to compare two nested models
- chi-square test for the change in scaled deviance
- this measures whether the inclusion of one or more additional explanatory variables in a model improves the model fit significantly
F statistics
- in the case where the scale parameter for the model is unknown (gamma) it has to be estimated
- the estimate of the scale parameter is chi-squared
- the ratio of the change in the deviance and the scale parameter is distributed with F distribution
How to compare models that are not nested?
AIC = -2log likelihood + 2number of parameters
looks at the tradeoff of the likelihood of a model against the number of parameters; the lower the AIC, the better the model. If two models fit the data equally well in terms of the log-likelihood, then the model with fewer parameters is better.
Use of CRLB
- can be used to measure the uncertainty in the parameter estimators used in a GLM. A poorly defined parameter will have a large standard error.
- standard errors can be found from the Hessian matrix (matrix of second derivatives of the log-likelihood)
- standard errors are the diagonal entries of -G^(-1)
Other ways to test significance
- consider spread of relativity values for each level, combined with the standard errors at each level
- comparison over time - analysis of claims frequency by factor by year will indicate whether claims frequencies have been stable over time. Can fit a model that includes interaction of a single factor with measure of time.
- consistency checks with other factors e.g. age and region
Hat matrix
shifts the vector of observed values to the vector of fitted values
ith leverage
- ith diagonal element of the hat matrix (lies between 0 and 1)
- measure of how much influence the ith observation has over its own fitted value
deviance residual
- measure of distance between actual observation and fitted value
= pg.656
standardised pearson residual
- difference between observed response and predicted value, adjusted for the standard deviation of the predicted value and the leverage of the observed response
- can compare standardised Pearson residuals
- does not adjust for the shape of the distribution
- pg.657
Testing appropriateness of models
- deviance residual
- standardised Pearson residuals
- residual plots
- Cook’s distance and leverage
Residual plots
If distribution chosen for response variable is appropriate, then the residual chard should produce residuals that:
- are symmetrical about the x-axis
- have an average residual of zero
- fairly constant across the width of the fitted values
Cook’s distance and leverage
- data points with large residuals and/or high leverage may distort the outcome/accuracy of a regression model
- Cooke’s distance is used to estimate the influence of a data point on the model results
- more than 1 merits closer analyses
Model refinement
- interactions
- aliasing
- restrictions
- smoothing
Interactions
- interactions relate to the effect that factors have upon the risk. An interaction would be necessary where the effect of two or more factors depend on each other
- Complete interactions:
one way of representing an interaction is to consider a single factor representing every combination of the two factors - Marginal interactions
Consider the single-factor effects of 1 and 2 and additional effect of interaction term over and above single factor effects
How to decide which interaction terms to test for inclusion in a GLM
- analyse every possible combination of pairs and test each for statistical significance and reasonableness
- look at structure of existing rating algorithm and see what interactions can be included without need for IT support
- use experience of the product and market
- speak to underwriters and other experts to see whether there are any parts of the account where your rates might be out of line with the market
Aliasing
Aliasing occurs when there is a linear dependency among observed covariates (one covariate may be identical to some linear combination of other covariates)
Intrinsic aliasing
- dependencies inherent in the definition of the covariates (most common when categorical variables are included)
Extrinsic aliasing
- dependency results from the nature of the data, rather than as a result of the inherent properties of the covariates
e. g. pg.662
Near aliasing
- two or more factors contain levels that are almost, but not quite, perfectly correlated
- convergence problems can arise
How can factors be simplified (reduce granularity)?
- grouping and summarising the data prior to loading
- carried out in order to clean the data and prevent anomalies, rather than to smooth results
- requires knowledge of expected pattern - grouping in the modelling package
- simply assigns a single parameter to represent the relativity for multiple levels of a factor
- a factor where two or more levels have been grouped together is called a customer factor. The GLM would only calculate parameter estimates for those levels.
- not grouped: simple factor. Parameter estimates are calculated for each.
Restrictions
- when the use of certain factors is restricted, the model may be able to compensate by adjusting the fitted relativities for correlated factors. This is achieved using the offset term
- the known, predetermined values of the parameters corresponding to this factor are added to the offset term.