Chapter 4: GLMs Flashcards
what is a GLM?
Comparatively, a GLM is more flexible than a linear model.
GLMs provide flexibility in two aspects:
- Distribution of the target variable:
- target variable is not confined to normal distribution. it only needs to be a part of the linear exponential family (contains both continuous and discrete)
- GLMs provide a unifying approach to modelling binary, discrete and continuous target variables with different mean-variance relationships - Relationship between target mean and the linear predictors
- instead of equating the target mean of the target variable directly with the linear combination, a GLM set a function of the target mean to be linearly related to the predictors.
- the link function can be any monotonic function (monotonic bc needs to be invertible)
can we use all feature generation (binarization, polynomial terms, interaction terms) and all feature selection techniques for GLM models?
yes
what does it mean to say “transformations are applied internally vs. externally)
GLMs: internally transforming the data
- the target variable is not transformed and the transformation plays its role only within the GLM itself
Linear models: externally transforming the target variable
what target distribution would we choose for a positive, continuous, right- skewed target variable?
gamma and inverse gaussian capture the skewness of the target variable directly without the use of transformations
inv. gaussian is more highly skewed than gamma
gamma is #1 choice here
what target distribution would we choose for a binary target variable?
binomial
the mean of the target variable is the probability that the event of interest occurs
what target distribution would we choose for a count variable?
count variable = represents the number of times a cretain event of interest happens over a reference time period
these variables only have non-negative integer values.
poission!
what target distribution would we choose for aggregate data?
tweedie. it is a poisson-gamma mixture.
discrete probability mass at zero and pdf on the positive real line
when is it good to use a log link?
for poisson, gamma and inverse gaussian
the target mean is positive and unbounded from above
g(mu) = ln(mu)
so then the inverse is exp(predictors) - which results in a positive value for target mean unbounded from above
when is it a good idea to use a logit link?
binary variables
logit link = ln(odds)
the logit link ensures that the target mean is between 0 and 1 (needs to be for a binary variable)
but the predictors can have an value from 0 to +inf
what is a logisitic regression model?
a GLM with a binary target variable and a logit link function
two factors need to be considered when choosing a link function:
- whether the predictions provided by the link align with the characteristics of the target variable
- whether the resulting GLM is easy to interpret (ex. logit is easier to interpret than probit)
T/F: the link function is to transform the target variable of a GLM so that the resulting distribution more closely resembles a normal distribution
false.
the link function is applied to the mean of the target variable; the target variable itself is left untransformed.
T/F: the main reason for using the log link in a GLM is to reduce the skewness of a non-negative, right skewed target variable.
false. The log link is chosen because it ensures appropriate predictions and eases model interpretation. The skewness can be accommodated by an appropriate target distribution
T/F: if some of the observations of the target variable are 0, then the log link cannot be used because ln(0) is not defined.
False.
the log link is not applied directly to the target variable itself
what two link functions are easiest to interpret ?
logit and log link
how to interpret GLM coefficients with log link for numeric predictors?
Multiplicative changes:
when all other variables are held fixed, a unit increase in X is associated with a multiplicative increase in the target mean by a factor of exp(beta)
how to interpret GLM coefficients with log link for categorical predictors?
if X is a dummy variable, then:
- at the baseline level, X = 0, and at the non-baseline level, X = 1.
SO
comparing the means, we see that the target mean when the categorical predictor lies in the non-baseline level is exp(beta) times of that when the categorical predictor is in the baseline level, holding all other predictors fixed.
how to interpret GLM coefficients with logit link?
the logit link is almost always used with binary data.
ln(odds) = f(x) or odds = exp( f(x) )
the interpretations are just phrased in terms of multiplicative changes in the odds of the event of interest.
a unit increase in a numeric predictor with coefficient beta is associated with a multiplicative change of exp(beta) in the odds.
what are weights and offsets?
modeling tools that are commonly used with GLMs.
they are designed to incorporate a measure of exposure into a GLM to improve the fitting
what is the idea behind using weights in a GLM?
to take advantage of the fact that different observations in the data may have different exposures and thus different degrees of precision, we can attach a higher weight to the observations with a larger exposure.
So that the more credible observations carry more weight in the estimation of the model coefficients
what is the idea behind using offsets in a GLM?
usually used with (not limited to) count data
we make the assumption that the target mean is directly proportional to the exposure.
how do we determine if an exposure variable should be used as an offset or a weight?
to use weights properly:
- the observations of the target variable should be averaged by exposure.
- due to the averaging, the variance of each observation is inversely related to the size of the exposure.
- the weights do not affect the mean of the target variable
offsets:
- observations are values aggregated over the exposure units.
- the exposure, when serving as an offset, is in direct proportion to the mean of the target variable
- the variance of the target variable is unaffected
what is the technique used to estimate coefficients in a GLM?
MLE instead of OLS (linear models)
what is the goodness of fit measure used in GLMs?
deviance.
why cant we use r^2 in glms to measure goodness of fit?
because r^2 operates on the assumption that the underlying distribution behind the target variable is normal.