Generalized Linear Models (25-35%) Flashcards
What are the Ordinary Linear Model (OLS) assumptions?
- The mean of the target variable is a linear function of the predictor variables
- The variable of the target variable is constant, regardless of the values of the predictor variables
- Given the predictor variables, the target variable has a normal distribution
- Given the predictor variables, the observations are independent
What are some common situations where the linear model assumptions do not hold?
- The range of target variables is positive, as is generally the case with insurance claim severity and claim counts. The normal distribution allows for negative values, and hence the model may predict negative outcomes.
- The variance of the target variable depends on the mean. This violates the constant variance assumption. For example, those for whom larger claims are predicted may also have a larger variance of those claims.
- The target variable is binary. The restriction to 0 and 1 responses does not fit the normal distribution, as predicted values can easily go out of this range.
- The relationship of the predictor variables to the target may not be linear. A common examples of non-linearity is a multiplicative relationship.
What are the Generalized Linear Models assumptions?
- Given the predictor variable values, the target variables are independent (this is unchanged).
- Given the predictor variable values, the target variable’s distribution is a member of the exponential family.
- Given the predictor variable values, the expected value of the target variables is mu=g^-1(n), n=xb. Where g is called the link function and g^-1 is its inverse
Note: if the conditional distribution of the target variable is normal (which is a member of the exponential family) and the link function is simply g(mu)- = mu, we have the ordinary regression model.
What are the commonly used link functions?
Identity: g(η) = η, g−1(η) = η
Log: g(η) = log(η), g−1(η) = exp(η)
Reciprocal: g(η) = 1 / η, g−1(η) = 1 / η
Logit: g(η) = log[η / (1 − η)], g−1(η) = eη / (1 + eη)
What is a key assumption of ordinary regression?
The conditional (given the values of the predictors) distribution of the response variable is normal.
Normal distribution observations are?
- Symmetric about the mean
- Continuous
- Can assume all positive and negative values
Exponential distributions for GLMs
Binary - when the response variable is binary (zero or one). Solution is to use logistic regression.
Count Data - if data is in the form of a whole number of occurrence, such as claim counts, it may be best to use Poisson regression because it places positive probability on 0,1,2,… Here, the mean is a positive number, so a link function that forces positive number is best. The log link is commonly used.
NOTE: Poisson assumes the mean and variance are equal.
Continuous positive value data - regression models include gamma, lognormal, and inverse Gaussian distributions. These would generally be used with a log link function to ensure positive predictions (with a log link, the linear predictor is exponentiated, and exponentiation always produces a positive value)
Positive and negative distribution values - normal distribution with a log link function. The predictions of the mean value for new observations will always be positive.
Tweedie distribution - an in-between distribution of Poisson and gamma where the variance power is between 1 and 2. An important feature of this distribution is that it has discrete probability at zero (no claims) and then continuous probability on positive claim values.
Define Overdispersion.
When the variance of the response is greater than the mean.
One simple fix to account for this is to use the quasi poisson family GLM instead of the Poisson. Note:the estimates are the same. The standard error of the estimates is different, though, which affects any hypothesis tests to be done. Also, note that the p-values for the hypothesis tests regarding the coefficients are all larger.
Explain what the Hypothesis test is.
The null hypothesis (H0) is when the corresponding predictor is equal to 0. The alternative hypothesis (G1) is when the predictor variable is not equal to 0.
The test statistic for this test follows a t distribution and is provided for each predictor in the column labeled “t-value.” The corresponding p-value is in the column labeled “Pr( > |t|).” This can be interpreted as the probability of observing a test statistic more extreme than the observed test statistic, given the null hypothesis is true. As with most hypothesis tests, when the p-value is low, the null hypothesis is rejected, and when it is high, it is said that we fail to reject the null hypothesis.
Define R^2 and adjusted R^2 values.
They are measures of goodness of fit. A higher value typically suggests a model that follows the data point better. One problem with R2 is that adding a predictor always increases its value. This violates the idea that a simpler model that performs almost as well is better. Adjusted R2 adds a penalty for more parameters.
Define Akaike Information Criterion (AIC).
It is helpful in comparing models.
A lower AIC suggests the model is a better fit for the data than a higher AIC.
Describe deviance.
Deviance is a measure of goodness of fit of a generalized linear model. It is similar to the sum of squared errors. The default value is called the null deviance and is the deviance measure when the target is predicted using its sample mean (so it is similar to the total sum of squares)
Deviance Summary table on pdf page 31
Define hypothesis tests.
They conduct a likelihood ratio test, and for the sample model, the very small p-value indicates that this variable is highly significant. There is a simple command that conducts this test on all the current variables ie. drop1(glm.freq, test=”LRT”)
Define Fisher Scoring.
Fisher’s Scoring algorithm is related to Newton’s method for solving maximum likelihood problems numerically.
Define Akaike Information Criterion (AIC).
AIC, as well as other information criteria, provides a way to access the quality of your model through comparison to related models. It’s based on the deviance, but penalizes it for making the model more complicated. However, the value of the AIC on its own is not meaningful; it needs to be used in comparison with the AIC of another model where you would select the model with the smallest AIC.
Example of AIC summary on pdf page 31.
Drop variables that do not add predictive value but it is advised to no drop more than one at a time. Two variables may appear to lack predictive power, but when one is dropped, the other’s value may increase. Keep in mind that the test does not answer the question “is this variable valuable?” but rather”in the presence of the other variables, does this variable provide additional value?”.