GLMs Flashcards

1
Q

Explain how a linear model can be viewed as a generalized linear model

A

A linear model is a special case of a GLM when the target variable is normally distributed and the link function is the identity function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the difference between weights and offsets when applied to a GLM

A

Weights and offsets both take exposure into account to improve fitting, however, the key differences are:
* Weights: the observations of the target variable should be averaged by exposure –> the variance of each observation is inversely related to the size of exposure –> weights do not affect the mean of the target variable
* Offsets: the observations of the target variable are aggregated over the exposure –> the mean of the target variable is in direct proportion to the exposure, but its variance is unaffected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

State the statistical method typically used to estimate the parameters of a GLM

A

Maximum Likelihood Estimation (MLE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the problem with deviance as a model selection criterion

A

Deviance is merely a goodness-of-fit measure on the training set and always decreases/never increases when new predictors are added. A GLM with the smallest deviance arrives at the most elaborate GLM, with the lowest training error but not necessarily the lowest test error, and is likely overfitted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the limitations of the likelihood ratio test as a model selection method

A
  • It can only be used to compare one pair of GLMs at a time
  • The simpler GLM must be a special case/nested within the more complex GLM in order to use LRT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain the importance of setting a cutoff for a binary classifier

A

It’s important to set a cutoff for a binary classifier to translate the predicted probabilities into predicted classes. For ex., we want to know whether we test positive or negative for COVID, not the predicited probability of getting infected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the relationship between accuracy, sensitivity, and specificity

A

Accuracy is a weighted average of specificity and sensitivity, where the weights are the proportions of observations belonging to the two classes
* Sensitivity = proportion of correctly classified positive observations
* Specificity = proportion of correctly classified negative observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain how the cutoff of a binary classifier affects sensitivity and specificity

A

The selection of the cutoff for a binary classifier involves a trade-off between having high sensitivity and having high specificity
* cutoff = 0 –> all observations are positive. sensitivity = 1 and specificity = 0.
* cutoff increases –> more negative observations, which means more true negatives and fewer false positives. sensitivity decreases and specificity increases
* cutoff = 1 –> all observations are negative. sensitivity = 0 and specificity = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the problem with unbalanced data

A

A classifier implicitly places more weight on the majority class without paying enough attention to the minority class. The problem with this is the fact that a high accuracy might be deceptive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain how undersampling and oversampling work to make unbalanced data more balanced

A

Undersampling produces roughly balanced data by drawing fewer observations from the negative class and retaining all of the positive observations. However, less data means the training becomes less robust and the classifier becomes more prone to overfitting

Oversampling produces roughly balanced data by retaining all observations in the dataset, but oversampling from the positive class. However, more data means a heavier computational burden

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain why oversampling must be performed after splitting the full data into training and test data

A

Oversampling keeps all the original data, but oversamples with replacement the positive class to reduce the imbalance between the two classes. If oversampling is not performed after splitting the data, some of the positive class observations may appear in both training and test sets, and the test set will not be truly unseen to the trained classifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain one reason for using oversampling over undersampling, and one reason for using undersampling over oversampling

A

Oversampling can be used to retain the information about the positive class.
Undersampling can be used to ease the computational burden and reduce run time when the training data is excessively large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the Tweedie distribution

A

The tweedie distribution has a mixture of discrete and continuous components.
* Tweedie is an “in-between” distribution of Poisson and gamma; a Poisson sum of gamma random variables
* Tweedie has a discrete probability mass at zero and a probability density function on the positive real line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are canonical links?

A

Canonical links have the advantage of simplifying the mathematics of the estimation process and making it more likely to converge, but they shouldn’t always be used.

Normal: Identity (u)
Binomial: Logit (ln[pi/(1-pi)])
Poisson: Log (lnu)
Gamma: Inverse (1/u)
Inverse Gaussian: Squared inverse (1/u^2)

Ex. the canonical link for gamma, inverse, does not guarantee positive predictions nor is it easy to interpret, so log link is more commonly used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Debunk a common misconception about link functions and GLMs

A

Link functions are applied to the mean of the target variable, and leaves the target variable untransformed. For example, if log link is used, it’s fine for some of the observations of the target variable to be zero because the log link is not applied to the target observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why might we apply weights and offsets when fitting GLMs?

A

To take advantage of the fact that different observations in the data may have different exposures and thus different degrees of precision. The goal is to improve the reliability of the fitting procedure

17
Q

What happens when we use logged exposure as an offset?

A

We are assuming that the target mean varies in direct proportion to E (not lnE)

18
Q

Explain the pros and cons of using MLE

A

Pros: MLE produces estimates with desirable statistical properties such as asymptotic unbiasedness, efficiency, and normality
Cons: The optimization algorithim for MLE is occassionally plagued by convergence issues (which may happen when a non-canonical link is used) which means no estimates may be produced and the GLM cannot be fitted or applied as a result

19
Q

What are some facts about deviance?

A
  • Deviance reduces to RSS for linear models, which means deviance can be seen as a generalization of RSS that works for non-normal target variables in the exponential family
  • Deviance should only be used to compare GLMs having the same target distribution (so they have the same maximum loglikelihood of the saturated model)
  • Deviance provides the foundation of an important model diagnostic tool for GLMs, the deviance residual
20
Q

What are the properties of deviance residuals (that parallel those of the raw residuals of a linear model)?

A
  • They are approximately normally distributed for most target distributions in the linear exponential family (except binomial) –> qq plots for deviance residuals –> valid even if the target distribution is not normal
  • They have no systematic patterns when considered on their own and with respect to the predictors
  • They have approximately constant variance upon standardization (using the standard error implied by the GLM)
21
Q

What does it mean when observed points on the qq plot for deviance residuals are far from the reference straight line?

A

The target distribution, link function, and/or the form of the model equation may not be appropriate

22
Q

Explain what standardized deviance residuals are

A

They are deviance residuals scaled by their standard error implied by the GLM, and should be approximately homoscedastic if a correct model is used

23
Q

Explain what an AUC of almost 1, 0.5, and near 0 indicates

A

A perfect model that predicts the correct class for new data each time will have a ROC plot showing the curve approaching the top left corner with an AUC near 1.0 (perfect classifier)

When a model has an AUC of 0.5, it classifies the observations purely randomly without using the information contained in the predictors (naive classifier). Any model having an AUC less than 0.5 means it is providing predictions that are worse than random selection, with a near 0 AUC indicating that the model makes the wrong classification almost every time.

24
Q

What is overdispersion?

A

When the variance of the target variable exceeds its mean. For example, the Poisson distribution can be used to model count variables, but it’s vulnerable to the problem of overdispersion (Poisson requires its mean and variance be equal)