Chapter 18 - GLM Flashcards by Uthmaan Ebrahim

What are explanatory variables

They are inputs into a model that are expected to influence the response variable.

How well did you know this?

Not at all

Perfectly

What are response variables

They are outputs from the model that are likely to be affected by the explanatory variables

How well did you know this?

Not at all

Perfectly

What are categorical values and non-categorical values

They are explanatory variables that are used for modelling where the values of each level are distinct, and often cannot be given any natural ordering or score. Eg. gender which takes on the value male or female.

Non-categorical values can take numerical values, eg age

How well did you know this?

Not at all

Perfectly

What are the drawbacks of using a normal model for linear regression

CLAN

The normal distribution has a CONSTANT variance, which may not be appropriate for the variable being modelled
More than 2 explanatory variables makes the time LONG to compute
The normal model ADDS together the effects of the different explanatory variables, but this is seldom what is observed in practice
It assumes that the response variable, Y, has a NORMAL distribution , which may not be appropriate for the variable being modelled

How well did you know this?

Not at all

Perfectly

How does GLMs address the problems for the normal model for linear regression

The response variable can take any distribution from the exponential family - it no longer has to take the normal distribution
A link function is introduced - this acts to remove the assumption that the effects of different variables must simply be added together

How well did you know this?

Not at all

Perfectly

What is the purpose of the link function

The link function acts to remove the assumption that the effects of different variables must simply be added together

How well did you know this?

Not at all

Perfectly

What does the deviance of a model compare

It compares the observed value Yi to the fitted value Ui

In essence, the deviance is a measure of how much the fitted values differ from the obervations

How well did you know this?

Not at all

Perfectly

What do the chi-squared statistic measure

This measures whether the inclusion of one or more additional explanatory variables in a model improves the fit significantly

How well did you know this?

Not at all

Perfectly

What can be used to measure the uncertainty in the parameter estimators used in GLM

The Cramer-rao lower bound

How well did you know this?

Not at all

Perfectly

What are deviance residuals

Is a measure of the distance between the actual observation and the fitted values

How well did you know this?

Not at all

Perfectly

What are standardised pearson residuals

It is the difference between the observed response and the predicted value,

adjusted for the standard deviation of the predicted value

and the leverage of the observed response

How well did you know this?

Not at all

Perfectly

What is Cook’s distance used for

It is used to estimate the influence of a data point on the model results

Cook’s distance of 1 or more is considered to merit closer examination in the analysis

How well did you know this?

Not at all

Perfectly

What is Aliasing

Aliasing occurs when there is dependency among the observed covariates. i.e one covariate may be identical to some linear combination of other covariates

How well did you know this?

Not at all

Perfectly

What is intrinsic aliasing

This occurs because of dependencies inherent in the definition of the covariates.

These intrinsic dependencies arise most commonly whenever categorical variables are included in the model.

How well did you know this?

Not at all

Perfectly

What is Extrinsic aliasing

It arises from a dependency among the covariates. It arises when the dependency results from the nature of the data itself, rather than as a result of the inherent properties of the covariates.

How well did you know this?

Not at all

Perfectly

What is an interaction term

Study These Flashcards

It is used where the pattern in the response variable is better modelled by including extra parameters for each combination of two or more factors

An interaction exists when the effect of one factor varies depending on the value of another factor

What are the assumptions of the classical normal model for linear regression

Study These Flashcards

The mean is a linear combination of the explanatory variables
The error terms are independent and come from a normal distribution
The error terms have constant variance

What 2 properties do the members of the exponential family have

Study These Flashcards

The distribution is completely specified in terms of its mean and variance
The variance of Yi is a function of its mean

What is special about the Tweedie distribution

Study These Flashcards

The Tweedie distribution is a special member of the exponential family that has a variance function proportional to \mu^{p}, with p being an additional parameter
In the case of 1<p<2, the Tweedie distribution has a point mass at zero and corresponds to the compound distribution of a Poisson claim number process and a gamma claim size distribution
The distribution can be Poisson-like (as p tends to 1) or gamma-like (as p tends to 2)

How is the number of degrees of freedom calculated

Study These Flashcards

It is the number of observations less the number of parameters

What are nested models

Study These Flashcards

Two models are nested if one model contains explanatory variables that are a subset of the explanatory variables in the other model

What is the equation of the Akaike Information Criteria and what does it measure

Study These Flashcards

AIC = -2xlog likelihood + 2x number of parameters

It looks at the trade-off of the likelihood of a model against the number of parameters: the lower the AIC, the better the model

What does the deviance residual measure

Study These Flashcards

It measures the distance between the actual observation and fitted value

What is the difference between correlations and interactions

Study These Flashcards

Correlations occur when there is a relationship between the distribution of exposure between levels of two or more factors
GLMs automatically take account of correlations (unlike one-way tables)
Interactions relate to the effect that factors have upon the risk
To define the risk accurately, an interaction would be necessary where the effect of two (or more) factors depend on each other. GLMs can be specified to include interactions.

What are GLMs used for

- They are used to assess and quantify the relationship between a response variable and a set of possible explanatory variables

What does the leverage measure

It measures how much influence each observed value has on the fitted value for that observation

Chapter 18 - GLM Flashcards

(26 cards)