GLMs Flashcards

Question 1

Q

Explain how a linear model can be viewed as a generalized linear model

Answer

A

A linear model is a special case of a GLM when the target variable is normally distributed and the link function is the identity function

Question 2

Q

Explain the difference between weights and offsets when applied to a GLM

Answer

A

Weights and offsets both take exposure into account to improve fitting, however, the key differences are:
* Weights: the observations of the target variable should be averaged by exposure –> the variance of each observation is inversely related to the size of exposure –> weights do not affect the mean of the target variable
* Offsets: the observations of the target variable are aggregated over the exposure –> the mean of the target variable is in direct proportion to the exposure, but its variance is unaffected

Question 3

Q

State the statistical method typically used to estimate the parameters of a GLM

Answer

A

Maximum Likelihood Estimation (MLE)

Question 4

Q

Explain the problem with deviance as a model selection criterion

Answer

A

Deviance is merely a goodness-of-fit measure on the training set and always decreases/never increases when new predictors are added. A GLM with the smallest deviance arrives at the most elaborate GLM, with the lowest training error but not necessarily the lowest test error, and is likely overfitted.

Question 5

Q

Explain the limitations of the likelihood ratio test as a model selection method

Answer

A

It can only be used to compare one pair of GLMs at a time
The simpler GLM must be a special case/nested within the more complex GLM in order to use LRT

Question 6

Q

Explain the importance of setting a cutoff for a binary classifier

Answer

A

It’s important to set a cutoff for a binary classifier to translate the predicted probabilities into predicted classes. For ex., we want to know whether we test positive or negative for COVID, not the predicited probability of getting infected

Question 7

Q

Explain the relationship between accuracy, sensitivity, and specificity

Answer

A

Accuracy is a weighted average of specificity and sensitivity, where the weights are the proportions of observations belonging to the two classes
* Sensitivity = proportion of correctly classified positive observations
* Specificity = proportion of correctly classified negative observations

Question 8

Q

Explain how the cutoff of a binary classifier affects sensitivity and specificity

Answer

A

The selection of the cutoff for a binary classifier involves a trade-off between having high sensitivity and having high specificity
* cutoff = 0 –> all observations are positive. sensitivity = 1 and specificity = 0.
* cutoff increases –> more negative observations, which means more true negatives and fewer false positives. sensitivity decreases and specificity increases
* cutoff = 1 –> all observations are negative. sensitivity = 0 and specificity = 1.

Question 9

Q

Explain the problem with unbalanced data

Answer

A

A classifier implicitly places more weight on the majority class without paying enough attention to the minority class. The problem with this is the fact that a high accuracy might be deceptive

Question 10

Q

Explain how undersampling and oversampling work to make unbalanced data more balanced

Answer

A

Undersampling produces roughly balanced data by drawing fewer observations from the negative class and retaining all of the positive observations. However, less data means the training becomes less robust and the classifier becomes more prone to overfitting

Oversampling produces roughly balanced data by retaining all observations in the dataset, but oversampling from the positive class. However, more data means a heavier computational burden

Question 11

Q

Explain why oversampling must be performed after splitting the full data into training and test data

Answer

A

Oversampling keeps all the original data, but oversamples with replacement the positive class to reduce the imbalance between the two classes. If oversampling is not performed after splitting the data, some of the positive class observations may appear in both training and test sets, and the test set will not be truly unseen to the trained classifier.

Question 12

Q

Explain one reason for using oversampling over undersampling, and one reason for using undersampling over oversampling

Answer

A

Oversampling can be used to retain the information about the positive class.
Undersampling can be used to ease the computational burden and reduce run time when the training data is excessively large

Question 13

Q

Explain the Tweedie distribution

Answer

A

The tweedie distribution has a mixture of discrete and continuous components.
* Tweedie is an “in-between” distribution of Poisson and gamma; a Poisson sum of gamma random variables
* Tweedie has a discrete probability mass at zero and a probability density function on the positive real line.

Question 14

Q

What are canonical links?

Answer

A

Canonical links have the advantage of simplifying the mathematics of the estimation process and making it more likely to converge, but they shouldn’t always be used.

Normal: Identity (u)
Binomial: Logit (ln[pi/(1-pi)])
Poisson: Log (lnu)
Gamma: Inverse (1/u)
Inverse Gaussian: Squared inverse (1/u^2)

Ex. the canonical link for gamma, inverse, does not guarantee positive predictions nor is it easy to interpret, so log link is more commonly used

Question 15

Q

Debunk a common misconception about link functions and GLMs

Answer

A

Link functions are applied to the mean of the target variable, and leaves the target variable untransformed. For example, if log link is used, it’s fine for some of the observations of the target variable to be zero because the log link is not applied to the target observations

Question 16

Q

Why might we apply weights and offsets when fitting GLMs?

Answer

Study These Flashcards

A

To take advantage of the fact that different observations in the data may have different exposures and thus different degrees of precision. The goal is to improve the reliability of the fitting procedure

Question 17

Q

What happens when we use logged exposure as an offset?

Answer

Study These Flashcards

A

We are assuming that the target mean varies in direct proportion to E (not lnE)

Question 18

Q

Explain the pros and cons of using MLE

Answer

Study These Flashcards

A

Pros: MLE produces estimates with desirable statistical properties such as asymptotic unbiasedness, efficiency, and normality
Cons: The optimization algorithim for MLE is occassionally plagued by convergence issues (which may happen when a non-canonical link is used) which means no estimates may be produced and the GLM cannot be fitted or applied as a result

Question 19

Q

What are some facts about deviance?

Answer

Study These Flashcards

A

Deviance reduces to RSS for linear models, which means deviance can be seen as a generalization of RSS that works for non-normal target variables in the exponential family
Deviance should only be used to compare GLMs having the same target distribution (so they have the same maximum loglikelihood of the saturated model)
Deviance provides the foundation of an important model diagnostic tool for GLMs, the deviance residual

Question 20

Q

What are the properties of deviance residuals (that parallel those of the raw residuals of a linear model)?

Answer

Study These Flashcards

A

They are approximately normally distributed for most target distributions in the linear exponential family (except binomial) –> qq plots for deviance residuals –> valid even if the target distribution is not normal
They have no systematic patterns when considered on their own and with respect to the predictors
They have approximately constant variance upon standardization (using the standard error implied by the GLM)

Question 21

Q

What does it mean when observed points on the qq plot for deviance residuals are far from the reference straight line?

Answer

Study These Flashcards

A

The target distribution, link function, and/or the form of the model equation may not be appropriate

Question 22

Q

Explain what standardized deviance residuals are

Answer

Study These Flashcards

A

They are deviance residuals scaled by their standard error implied by the GLM, and should be approximately homoscedastic if a correct model is used

Question 23

Q

Explain what an AUC of almost 1, 0.5, and near 0 indicates

Answer

Study These Flashcards

A

A perfect model that predicts the correct class for new data each time will have a ROC plot showing the curve approaching the top left corner with an AUC near 1.0 (perfect classifier)

When a model has an AUC of 0.5, it classifies the observations purely randomly without using the information contained in the predictors (naive classifier). Any model having an AUC less than 0.5 means it is providing predictions that are worse than random selection, with a near 0 AUC indicating that the model makes the wrong classification almost every time.

Question 24

Q

What is overdispersion?

Answer

Study These Flashcards

A

When the variance of the target variable exceeds its mean. For example, the Poisson distribution can be used to model count variables, but it’s vulnerable to the problem of overdispersion (Poisson requires its mean and variance be equal)

GLMs Flashcards

(24 cards)