Module 6 - GLM Flashcards
Overdispersion definition? how to fix?
1) Variance of response is greater than the mean in the Poisson GLM regression
2) Use quasi-likelihood (quasi Poisson)
- Estimates will be the same, but standard error of estimates will be different
Interaction terms: why does this occur?
1) Occurs when the response depends on the relationship between a combination of features
Interaction terms: why use underlying variables?
Hierarchy principle: if we include in an interaction in a model, should also include the main effects, even if insignificant p-values
1) Interactions hard to interpret in a model without main effects
2) Interactions also contain main effects, even if main model has no main effects
R^2, what does it mean? Problem with the measure and solution?
1) Fraction of variance explained by the model = fraction of variance reduced
2) Adding a predictor always increases its value
- Fix: adjusted R^2 adds a penalty for more parameters
Collinearity, definition? How to deal with it? (2)
-2 or more predictor variables are related to each other
solutions:
1) Drop one of the problematic variables
2) Combine collinear variables together into a single predictor
Offsets, definition? Why are they used?
- Variable for which the effect on the response is known. Therefore, the coefficient does not need to be estimated. (Beta = 1)
- But the GLM still needs to be made aware of the existence of the offsetted variable so that the estimated coefficients for the OTHER variables are optimal in its presence
- Used to adjust for exposure
Prior weights, definition? why used?
- Give information about the credibility of each observation in the model.
- Assign a great credibility to rows that represent a greater number of risks in the estimation of the model coefficients.
- Weight variable specifies the weight given to each record in the estimation process.
ex: 1 year of exposure vs 1 month of exposure
-Observations with HIGHER EXPOSURE deemed to have LOWER VARIANCE
Deviance, definition? how does it work?
- Measure of goodness of fit of a GLM
- Compares likelihood with/without parameters
- Smaller deviance = Better model
Homoscedasticity, definition?
Error terms have constant variance
e ~ N( 0 , sigma)
Graph to use for:
1) Observations that have too large an impact on coefficients?
2) Normality of the distribution of residuals?
3) Homogeneity of the variance and linearity of relationship? (2)
1) Residuals vs Leverage
2) Normal vs Q-Q
3) Residuals vs Fitted, Scale-Location
Alpha parameters for:
1) Lasso?
2) Elastic Net?
3) Ridge regression
1) Lasso: Alpha = 1
2) Elastic Net: 0 < Alpha < 1
3) Ridge: Alpha = 0
Difference between lasso and ridge? What is one better than the other at?
1) With Lasso, optimal solution can reduce a coefficient to exactly = 0
- Which cannot happen with ridge
- Thus, lasso can completely remove a feature
2) Lasso: Better at feature selection
- Ridge: Better at fit
With lasso, as Lambda increases…?
- More of the features will be eliminated
- Larger coefficients shrink at a much faster rate than smaller coefficients
Limitation of feature selection using regularization techniques?
Automatic method -> so not always most interpretable
Cross-Validation, explained, steps?
Repeating the validation step with different training/test samples
1) Train model on k-1 parts, predict and record error on validation
2) Repeat K times, for all possibilities
3) Calculate the errors for each -> the CV error will equal the weighted sum of errors
- Take the average of evaluation from each fold
GLM to use for: (Family and link function)
Probability or binary?
Family = binomial
Link function = Logit
GLM to use for: (Family and link function)
Count
Family = poisson or quasipoisson
Link function = Log
GLM to use for: (Family and link function)
Continuous positive?
Family = Gamma, Inverse Gaussian
Link function = Log (for both)
Advantages of using an alternative fitting procedure, such as subset selection and shrinkage, instead of least squares? (3)
1) Simpler model
2) Improved prediction accuracy
3) Results easier to interpret
Lasso Regularized regression
1) Advantages? (2)
1) Binarization always done (through the use of the model matrix), and each factor level treated as a separate feature
2) Variable selection is automatic, using CV to minimize prediction error rather than a proxy such as AIC or hypothesis tests
Lasso Regularized regression
1) Disadvantages? (1)
1) Since variables are scaled, the estimated coefficients are difficult to interpret
Classification tree using Cost-Complexity Pruning
1) Advantages? (4)
1) Easy to explain and present due to if/else nature
2) Automatically removes variables (by not showing up in the tree), allowing interpretation to focus on the most significant factors
3) More easily adapts to non-linear relationships
4) Automatically captures interaction effects
Classification tree using Cost-Complexity Pruning
1) Disadvantages? (3)
1) Danger of overfitting
2) Resulting Tree can be highly dependent on the training set
Random Forest
1) Advantages? (2)
1) Reduces overfitting and variance by allowing results from multiple trees to be combined
2) Uses CV to set the tuning parameters
Random Forest
1) Disadvantages? (3)
1) Difficult to interpret
2) Longer runtime
3) Difficult to implement
Disadvantage of stepAIC algorithm for factor variables?
How to solve the issue?
1) stepAIC treats factor variables as a single feature
- As such, it either retains the variable with its existing variables or removes the variable entirely.
- Does not allow for the possibility that individual factor levels may be insignificant with regard to the base level, or insignificantly different from other levels
2) Solution: binarize the factor variables (1 or 0)