Chapter 16 Generalised Linear Modelling Flashcards
Describe the principal modelling techniques appropriate to health and care insurance:
What is an explanatory variable?
-input into a model that is expected to influence the response variable. -i.e. rating factor -it is important that explanatory variables make intuitive sense.
What is a response variable?
-output variable from a model is likely to be influenced by an explanatory variable. -ie price
What is a categorical variable?
-These are explanatory variables which are discrete and distinct, often cannot be given any natural ordering score. -Eg gender
What is non-categorical variable?
-can take numerical values eg age.
What is an interaction term?
-Used where the pattern in response variable is better modelled by including an extra parameter for each combination of two or more factors.
One-way analysis merits
-prior to use of GLMs the effect of frequency and severity of each rating factor was considered separately. -This one-way analysis ignores correlations and interaction effects between variables and so may underestimate or double count the effects of variables.
Uses of GLMs
-A GLM can be used to model the behaviour of a random variable that is believed to depend on the values of several other characteristics eg age, sex, chronic condition. -It is a generalisation of the normal model for multiple linear regression.
What are the drawbacks for the normal model for multiple linear regression?
-it assumes the response variable has a normal distribution -the normal distribution has a constant variance which may not be appropriate -it adds together the effects of different explanatory variables, but is often not what is observed -it becomes long-winded with more than two explanatory variables.
Assumptions of classical linear models
-error term are independent and come from a normal distribution -the mean is a linear combination of the explanatory variables -the error terms have constant variance (or homoscedasticity)
What are the two properties of any member of the exponential family?
-the distribution is completely specified in terms of its mean and variance. -the variance is a function of its mean
What is the link function?
-the link function acts to remove the assumption that the effects of different variables must simply be added together. -it must be both differentiable and monotonic. -include:log, logit & identity functions.
Steps for obtaining predicted values from a single GLM
-Specify design matrix X and the vector of parameters Beta -Choose a distribution for the response variable and the link function. -Identify the log-likelihood function -Take logarithm to convert the product of many terms into a sum -Maximise the logarithm of the likelihood function by taking partial derivatives with respect to each parameter. -Compute predicted values.
What techniques are used to analyse significance of explanatory variables?
-chi-squared test -the F-statistic - models need to be nested for this to work. -Akaike Criterion Information - appropriate where models are not nested. -other methods
Define degrees of freedom
-number of observations - number of parameters
AIC formula
AIC = -2 * log likelihood + 2* number of parameters -the lower the AIC the better the model. -fewer parameters is better/parsimonious model.