Chapter 4: GLMs Flashcards

Question 1

Q

what is a GLM?

Answer

A

Comparatively, a GLM is more flexible than a linear model.

GLMs provide flexibility in two aspects:

Distribution of the target variable:
- target variable is not confined to normal distribution. it only needs to be a part of the linear exponential family (contains both continuous and discrete)
- GLMs provide a unifying approach to modelling binary, discrete and continuous target variables with different mean-variance relationships
Relationship between target mean and the linear predictors
- instead of equating the target mean of the target variable directly with the linear combination, a GLM set a function of the target mean to be linearly related to the predictors.
- the link function can be any monotonic function (monotonic bc needs to be invertible)

Question 2

Q

can we use all feature generation (binarization, polynomial terms, interaction terms) and all feature selection techniques for GLM models?

Question 3

Q

what does it mean to say “transformations are applied internally vs. externally)

Answer

A

GLMs: internally transforming the data
- the target variable is not transformed and the transformation plays its role only within the GLM itself

Linear models: externally transforming the target variable

Question 4

Q

what target distribution would we choose for a positive, continuous, right- skewed target variable?

Answer

A

gamma and inverse gaussian capture the skewness of the target variable directly without the use of transformations

inv. gaussian is more highly skewed than gamma

gamma is #1 choice here

Question 5

Q

what target distribution would we choose for a binary target variable?

Answer

A

binomial

the mean of the target variable is the probability that the event of interest occurs

Question 6

Q

what target distribution would we choose for a count variable?

Answer

A

count variable = represents the number of times a cretain event of interest happens over a reference time period

these variables only have non-negative integer values.

poission!

Question 7

Q

what target distribution would we choose for aggregate data?

Answer

A

tweedie. it is a poisson-gamma mixture.

discrete probability mass at zero and pdf on the positive real line

Question 8

Q

when is it good to use a log link?

Answer

A

for poisson, gamma and inverse gaussian

the target mean is positive and unbounded from above

g(mu) = ln(mu)
so then the inverse is exp(predictors) - which results in a positive value for target mean unbounded from above

Question 9

Q

when is it a good idea to use a logit link?

Answer

A

binary variables
logit link = ln(odds)

the logit link ensures that the target mean is between 0 and 1 (needs to be for a binary variable)
but the predictors can have an value from 0 to +inf

Question 10

Q

what is a logisitic regression model?

Answer

A

a GLM with a binary target variable and a logit link function

Question 11

Q

two factors need to be considered when choosing a link function:

Answer

A

whether the predictions provided by the link align with the characteristics of the target variable
whether the resulting GLM is easy to interpret (ex. logit is easier to interpret than probit)

Question 12

Q

T/F: the link function is to transform the target variable of a GLM so that the resulting distribution more closely resembles a normal distribution

Answer

A

false.

the link function is applied to the mean of the target variable; the target variable itself is left untransformed.

Question 13

Q

T/F: the main reason for using the log link in a GLM is to reduce the skewness of a non-negative, right skewed target variable.

Answer

A

false. The log link is chosen because it ensures appropriate predictions and eases model interpretation. The skewness can be accommodated by an appropriate target distribution

Question 14

Q

T/F: if some of the observations of the target variable are 0, then the log link cannot be used because ln(0) is not defined.

Answer

A

False.

the log link is not applied directly to the target variable itself

Question 15

Q

what two link functions are easiest to interpret ?

Answer

A

logit and log link

Question 16

Q

how to interpret GLM coefficients with log link for numeric predictors?

Answer

A

Multiplicative changes:
when all other variables are held fixed, a unit increase in X is associated with a multiplicative increase in the target mean by a factor of exp(beta)

Question 17

Q

how to interpret GLM coefficients with log link for categorical predictors?

Answer

A

if X is a dummy variable, then:
- at the baseline level, X = 0, and at the non-baseline level, X = 1.

SO
comparing the means, we see that the target mean when the categorical predictor lies in the non-baseline level is exp(beta) times of that when the categorical predictor is in the baseline level, holding all other predictors fixed.

Question 18

Q

how to interpret GLM coefficients with logit link?

Answer

A

the logit link is almost always used with binary data.
ln(odds) = f(x) or odds = exp( f(x) )

the interpretations are just phrased in terms of multiplicative changes in the odds of the event of interest.

a unit increase in a numeric predictor with coefficient beta is associated with a multiplicative change of exp(beta) in the odds.

Question 19

Q

what are weights and offsets?

Answer

A

modeling tools that are commonly used with GLMs.

they are designed to incorporate a measure of exposure into a GLM to improve the fitting

Question 20

Q

what is the idea behind using weights in a GLM?

Answer

A

to take advantage of the fact that different observations in the data may have different exposures and thus different degrees of precision, we can attach a higher weight to the observations with a larger exposure.

So that the more credible observations carry more weight in the estimation of the model coefficients

Question 21

Q

what is the idea behind using offsets in a GLM?

Answer

A

usually used with (not limited to) count data

we make the assumption that the target mean is directly proportional to the exposure.

Question 22

Q

how do we determine if an exposure variable should be used as an offset or a weight?

Answer

A

to use weights properly:

the observations of the target variable should be averaged by exposure.
due to the averaging, the variance of each observation is inversely related to the size of the exposure.
the weights do not affect the mean of the target variable

offsets:
- observations are values aggregated over the exposure units.
- the exposure, when serving as an offset, is in direct proportion to the mean of the target variable
- the variance of the target variable is unaffected

Question 23

Q

what is the technique used to estimate coefficients in a GLM?

Answer

A

MLE instead of OLS (linear models)

Question 24

Q

what is the goodness of fit measure used in GLMs?

Answer

A

deviance.

Question 25

Q

why cant we use r^2 in glms to measure goodness of fit?

Answer

A

because r^2 operates on the assumption that the underlying distribution behind the target variable is normal.

Question 26

Q

what does deviance measure?

Answer

A

the extent to which the GLM departs from the most elaborate GLM (the saturated model)

the saturated model has many model parameters as the number of training observations, it perfectly fits every training observation and is a very flexible GLM.

Question 27

Q

do we want a low or high deviance?

Answer

A

we want a lower deviance.

the lower the deviance, the closer the GLM is to the model with a perfect fit, and the better its goodness of fit on the training set.

Question 28

Q

what is a drawback of using deviance as a goodness of fit measure for a GLM?

Answer

A

it can only be used to compare GLMs having the same target distribution (so that they share the same maximized log likelihood of the saturated model.

Question 29

Q

why are raw residuals not useful in a GLM?

Answer

A

because they are no longer normally distributed, nor do they possess the constant variance (bc their variance varies with the target mean, which varies with the different observations)

Question 30

Q

what type of residuals do we use in GLMs?

Answer

A

deviance residuals

Question 31

Q

deviance residuals satisfy the following properties which are parallel to those of raw residuals in a linear model (3)

Answer

A

they are approximately normally distributed (not for binomial)
they have no systematic patters when considered on their own and with respect to the predictors
they have approx. constant variance upon standardization

Question 32

Q

why is is important for deviance residuals to be approximately normal?

Answer

A

because it provides the basis for comparing the distribution of the deviance residuals with the normal dist (qq plots)

Question 33

Q

for GLMs, a regularized model results from minimizing the penalized objective function given by:

Answer

A

deviance (goodness of fit) + regularization penalty (complexity)

Question 34

Q

what is a performance metric used for numeric target variables in GLMs?

Answer

A

Test RMSE

Question 35

Q

What is used to measure the performance of a binary classifier (GLM)?

Answer

A

we could use a classification error rate but more used is: confusion matrix

Question 36

Q

confusion matrices: how do we translate the predicted probabilities into predicted classes?

Answer

A

using a pre-specified cutoff.

if the predicted probability of the event for an observation is higher than the cutoff, then the event is predicted to occur
the the predicted probability < cutoff, then the event is not predicted to occur

Question 37

Q

what kinds of performance metrics can be calculated from confusion matrices? 4

Answer

A

classification error rate
accuracy
sensitivity
specificity

Question 38

Q

confusion matrices: explain what the classification error rate is and how to calculate it.

Answer

A

= (FP + FN) / n

this is the proportion of misclassifications

Question 39

Q

confusion matrices: explain what the accuracy measure is and how to calculate it.

Answer

A

= (TN + TP) / n

the proportion of correctly classified observations

Question 40

Q

confusion matrices: explain what the sensitivity measure is and how to calculate it.

Answer

A

= TP / ( TP + FN )
relative frequency of correctly predicting the event occurring when the event does happen

how sensitive a classifier is at identifying positive cases

Question 41

Q

confusion matrices: explain what the specificity measure is and how to calculate it.

Answer

A

= TN / ( TN + FP )

opposite of sensitivity
relative frequency of correctly predicting an event not to occur when it actually did not

larger specificity - better the classifier is at confirming negative cases

Question 42

Q

T/F: accuracy is a weighted average of sensitivity and specificity
(confusion matrices)

Question 43

Q

confusion matrices: does changing the cutoff involve a trade-off?

Answer

A

yes, a trade off between specificity and sensitivity.

we want them both to be as close to 1 as possible.

Question 44

Q

confusion matrices: what happens if the cutoff is set to 0? what are the sensitivity and specificity values?

Answer

A

all predicted probabilities will exceed the cutoff
every prediction is predicted to occur

sensitivity = 1 
specificity = 0 (because no negatives)

draw what the confusion matrix will look like

Question 45

Q

confusion matrices: what happens if the cutoff increases from 0? what are the sensitivity and specificity values?

Answer

A

more and more observations will be classified as negative and the entries in the matrix will move to the first row

sensitivity decreases
specificity increases

draw the arrows in the confusion matrix

Question 46

Q

confusion matrices: what happens if the cutoff is set to 1? what are the sensitivity and specificity values?

Answer

A

all predicted probabilities will be less than the cutoff, they will all be predicted to be negative.

sensitivity = 0 
specificity = 1

draw the matrix

Question 47

Q

confusion matrices: how do we choose a cutoff value? explain

Answer

A

using a ROC curve.

it is a graphical tool plotting the sens against the spec of a given classifier for each cutoff ranging from 0 to 1.

Question 48

Q

ROC curve: how can the predictive performance of a classifier be summarized?

Answer

A

by computing the AUC. the higher the better

Question 49

Q

ROC curve: what happens with AUC = 1?

Answer

A

the highest possible value of AUC is 1.

this classifier has perfect discriminatory power. specificity and sens both equal 1.

Question 50

Q

ROC curve: what happens with AUC = 0.5?

Answer

A

this is a useful baseline comparison. this is the naive classifier that classifies the observations purely randomly without using the information contained in the predictors.

Question 51

Q

what is the problem with unbalanced data in the context of a classifier?

Answer

A

the classifier will place more weight on the majority class and tries to match the training observations in that class, without paying enough attention to the minority class

Question 52

Q

what are two solutions to imbalanced data?

Answer

A

undersampling

2. oversampling (oversamples with replacement, the minority to reduce the imbalance)

Question 53

Q

how to calculate the RMSE of a model?

Answer

A

RMSE function

RMSE(data.test $ target , predictions)

the predictions are made using the prediction() function:
predict(model, newdata = data.test, type = “response”)

Question 54

Q

how to look at model diagnostics, how do we output the residuals vs. fitted values plot ,etc. ?

Answer

A

plot( model )

Question 55

Q

how to fit a GLM using the glm() function?

Answer

A

glm( target ~ . + interaction, family = distribution(link = “link”), data = dataset)

Question 56

Q

when you are asked to interpret the results of a GLM, what 3-part structure could you use?

Answer

A

interpret the precise values of the estimated coefficients (ex. every unit increase in a continuous predictor is associated with a multiplicative change of exp(beta) in the expected value of the target variable, holding everything else constant)
comment on whether the sign of the estimated coefficients makes sense. (common knowledge)
relate the findings to the business problem, how can these results help the clients?

Question 57

Q

when we use a variable in a GLM as an exposure variable, can we keep it in the original GLM?

Answer

A

no, we have to take it out of the model

glm(target ~ . - exposure var, family = dist(link = “link”), data = dataset)

Question 58

Q

how to include an offset in a GLM model in r?

Answer

A

glm ( target ~ . - exposure variable, data = data.train, offset = log(exposure variable), family = dist(link = “link”))

the offset has to be added to the model according to the link function.

Question 59

Q

how do you construct a confusion matrix in r? what package?

Answer

A

library (caret)

pre-specify the cutoff value (usually the mean of the target variable)
generate predicted values using the predict() function
generate a subset of the predicted probabilities that are assigned a value of 0 or 1 if larger or smaller than the cutoff

ex. class <= ifelse(predictions > cutoff , 1, 0)
4. create the confusion matrix

confusionMatrix(factor(class), factor(data.test$ target), positive = “1” )

Question 60

Q

why do the two first arguments of confusionMatrix need to be factors?

Answer

A

not sure but they just have to be

Question 61

Q

what package has to be installed for ROC and AUC? What function is used to create the ROC curve?

Answer

A

pROC

roc(data.train $ target, predicted values from training set) or could do this on the test set

Question 62

Q

how to calculate the AUC?

Answer

A

first make the ROC curve

then,

auc(roc curve)

Question 63

Q

how do we add binarized variables to the original dataset? What needs to be done following this?

Answer

A

using the function cbind()

delete the old variables from the dataset using NULL

do the test/train split again

refit the model

Question 64

Q

how to model weight in a GLM?

Answer

A

glm(target ~ . , family = dist(link = “link”), data = data.train, weight = exp. var. )