GLMs Flashcards
Explain how a linear model can be viewed as a generalized linear model
A linear model is a special case of a GLM when the target variable is normally distributed and the link function is the identity function
Explain the difference between weights and offsets when applied to a GLM
Weights and offsets both take exposure into account to improve fitting, however, the key differences are:
* Weights: the observations of the target variable should be averaged by exposure –> the variance of each observation is inversely related to the size of exposure –> weights do not affect the mean of the target variable
* Offsets: the observations of the target variable are aggregated over the exposure –> the mean of the target variable is in direct proportion to the exposure, but its variance is unaffected
State the statistical method typically used to estimate the parameters of a GLM
Maximum Likelihood Estimation (MLE)
Explain the problem with deviance as a model selection criterion
Deviance is merely a goodness-of-fit measure on the training set and always decreases/never increases when new predictors are added. A GLM with the smallest deviance arrives at the most elaborate GLM, with the lowest training error but not necessarily the lowest test error, and is likely overfitted.
Explain the limitations of the likelihood ratio test as a model selection method
- It can only be used to compare one pair of GLMs at a time
- The simpler GLM must be a special case/nested within the more complex GLM in order to use LRT
Explain the importance of setting a cutoff for a binary classifier
It’s important to set a cutoff for a binary classifier to translate the predicted probabilities into predicted classes. For ex., we want to know whether we test positive or negative for COVID, not the predicited probability of getting infected
Explain the relationship between accuracy, sensitivity, and specificity
Accuracy is a weighted average of specificity and sensitivity, where the weights are the proportions of observations belonging to the two classes
* Sensitivity = proportion of correctly classified positive observations
* Specificity = proportion of correctly classified negative observations
Explain how the cutoff of a binary classifier affects sensitivity and specificity
The selection of the cutoff for a binary classifier involves a trade-off between having high sensitivity and having high specificity
* cutoff = 0 –> all observations are positive. sensitivity = 1 and specificity = 0.
* cutoff increases –> more negative observations, which means more true negatives and fewer false positives. sensitivity decreases and specificity increases
* cutoff = 1 –> all observations are negative. sensitivity = 0 and specificity = 1.
Explain the problem with unbalanced data
A classifier implicitly places more weight on the majority class without paying enough attention to the minority class. The problem with this is the fact that a high accuracy might be deceptive
Explain how undersampling and oversampling work to make unbalanced data more balanced
Undersampling produces roughly balanced data by drawing fewer observations from the negative class and retaining all of the positive observations. However, less data means the training becomes less robust and the classifier becomes more prone to overfitting
Oversampling produces roughly balanced data by retaining all observations in the dataset, but oversampling from the positive class. However, more data means a heavier computational burden
Explain why oversampling must be performed after splitting the full data into training and test data
Oversampling keeps all the original data, but oversamples with replacement the positive class to reduce the imbalance between the two classes. If oversampling is not performed after splitting the data, some of the positive class observations may appear in both training and test sets, and the test set will not be truly unseen to the trained classifier.
Explain one reason for using oversampling over undersampling, and one reason for using undersampling over oversampling
Oversampling can be used to retain the information about the positive class.
Undersampling can be used to ease the computational burden and reduce run time when the training data is excessively large
Explain the Tweedie distribution
The tweedie distribution has a mixture of discrete and continuous components.
* Tweedie is an “in-between” distribution of Poisson and gamma; a Poisson sum of gamma random variables
* Tweedie has a discrete probability mass at zero and a probability density function on the positive real line.
What are canonical links?
Canonical links have the advantage of simplifying the mathematics of the estimation process and making it more likely to converge, but they shouldn’t always be used.
Normal: Identity (u)
Binomial: Logit (ln[pi/(1-pi)])
Poisson: Log (lnu)
Gamma: Inverse (1/u)
Inverse Gaussian: Squared inverse (1/u^2)
Ex. the canonical link for gamma, inverse, does not guarantee positive predictions nor is it easy to interpret, so log link is more commonly used
Debunk a common misconception about link functions and GLMs
Link functions are applied to the mean of the target variable, and leaves the target variable untransformed. For example, if log link is used, it’s fine for some of the observations of the target variable to be zero because the log link is not applied to the target observations