module 12 Flashcards
example of overfitting
nonlinear model has been trained too well on the training dataset, the test dataset has worse performance
Some Ways to overfit a linear regression model
Irrelevant explanatory variables
Collinear explanatory variables
Some ways to underfit a linear regression model
Leaving out an important explanatory variable
A parsimonious model
Aims to strike a balance between overfitting and underfitting the model to the training dataset
does this by: Having a low enough number of explanatory variables to avoid overfitting while
Having a high enough model fit to avoid underfitting
adjusted R^2
used to measure model parsimoniousness
(n-1/n-p-1)
Interpreting the adjusted R^2
The higher the adjusted R^2 of a model, the more parismonious we say that the model is, and therefore, the less likely the model is to be overfit to the training dataset
number of every possible model
2^p possible models
p is possible explanatory variables
Heuristic techniques
Backwards Elimination Algorithm
Forward Selection Algorithm
Regularized linear regression model
take the objective function from our basic linear regression model and add a type of penalty term
Penalty term
penalizes models that have too many explanatory variables that don’t bring enough predictive power to the model
Goal of the penalty term
Goal 1: interpretation of what to leave out
Goal 2: leave out variables that lead to overfitting
LASSO Regression L1
(stands for least absolute shrinkage and selection operator)
The sum of absolute values for all coefficients
Clearer slope interpretation with LASSO regression
If slope is set to be 0, LASSO regression model is suggesting that this slope’s corresponding variable can be left out of model.
Ridge Regression (L2 Penalty Term)
the square root of the sum of the squared values
Less clear slope interpretation with ridge regression
The resulting slopes found w ridge regression provide much less of a clear indication as to which explanatory variables should be left out of the model