Chapter 18 - GLM Flashcards

1
Q

What are explanatory variables

A

They are inputs into a model that are expected to influence the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are response variables

A

They are outputs from the model that are likely to be affected by the explanatory variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are categorical values and non-categorical values

A

They are explanatory variables that are used for modelling where the values of each level are distinct, and often cannot be given any natural ordering or score. Eg. gender which takes on the value male or female.

Non-categorical values can take numerical values, eg age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the drawbacks of using a normal model for linear regression

A

CLAN

  • The normal distribution has a CONSTANT variance, which may not be appropriate for the variable being modelled
  • More than 2 explanatory variables makes the time LONG to compute
  • The normal model ADDS together the effects of the different explanatory variables, but this is seldom what is observed in practice
  • It assumes that the response variable, Y, has a NORMAL distribution , which may not be appropriate for the variable being modelled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does GLMs address the problems for the normal model for linear regression

A
  • The response variable can take any distribution from the exponential family - it no longer has to take the normal distribution
  • A link function is introduced - this acts to remove the assumption that the effects of different variables must simply be added together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of the link function

A
  • The link function acts to remove the assumption that the effects of different variables must simply be added together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the deviance of a model compare

A

It compares the observed value Yi to the fitted value Ui

In essence, the deviance is a measure of how much the fitted values differ from the obervations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do the chi-squared statistic measure

A

This measures whether the inclusion of one or more additional explanatory variables in a model improves the fit significantly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can be used to measure the uncertainty in the parameter estimators used in GLM

A

The Cramer-rao lower bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are deviance residuals

A

Is a measure of the distance between the actual observation and the fitted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are standardised pearson residuals

A

It is the difference between the observed response and the predicted value,

adjusted for the standard deviation of the predicted value

and the leverage of the observed response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Cook’s distance used for

A

It is used to estimate the influence of a data point on the model results

Cook’s distance of 1 or more is considered to merit closer examination in the analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Aliasing

A

Aliasing occurs when there is dependency among the observed covariates. i.e one covariate may be identical to some linear combination of other covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is intrinsic aliasing

A

This occurs because of dependencies inherent in the definition of the covariates.

These intrinsic dependencies arise most commonly whenever categorical variables are included in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Extrinsic aliasing

A

It arises from a dependency among the covariates. It arises when the dependency results from the nature of the data itself, rather than as a result of the inherent properties of the covariates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an interaction term

A

It is used where the pattern in the response variable is better modelled by including extra parameters for each combination of two or more factors

An interaction exists when the effect of one factor varies depending on the value of another factor

17
Q

What are the assumptions of the classical normal model for linear regression

A
  • The mean is a linear combination of the explanatory variables
  • The error terms are independent and come from a normal distribution
  • The error terms have constant variance
18
Q

What 2 properties do the members of the exponential family have

A
  • The distribution is completely specified in terms of its mean and variance
  • The variance of Yi is a function of its mean
19
Q

What is special about the Tweedie distribution

A
  • The Tweedie distribution is a special member of the exponential family that has a variance function proportional to \mu^{p}, with p being an additional parameter
  • In the case of 1<p<2, the Tweedie distribution has a point mass at zero and corresponds to the compound distribution of a Poisson claim number process and a gamma claim size distribution
  • The distribution can be Poisson-like (as p tends to 1) or gamma-like (as p tends to 2)
20
Q

How is the number of degrees of freedom calculated

A

It is the number of observations less the number of parameters

21
Q

What are nested models

A

Two models are nested if one model contains explanatory variables that are a subset of the explanatory variables in the other model

22
Q

What is the equation of the Akaike Information Criteria and what does it measure

A

AIC = -2xlog likelihood + 2x number of parameters

It looks at the trade-off of the likelihood of a model against the number of parameters: the lower the AIC, the better the model

23
Q

What does the deviance residual measure

A

It measures the distance between the actual observation and fitted value

24
Q

What is the difference between correlations and interactions

A
  • Correlations occur when there is a relationship between the distribution of exposure between levels of two or more factors
  • GLMs automatically take account of correlations (unlike one-way tables)
  • Interactions relate to the effect that factors have upon the risk
  • To define the risk accurately, an interaction would be necessary where the effect of two (or more) factors depend on each other. GLMs can be specified to include interactions.
25
Q

What are GLMs used for

A
  • They are used to assess and quantify the relationship between a response variable and a set of possible explanatory variables
26
Q

What does the leverage measure

A

It measures how much influence each observed value has on the fitted value for that observation