Week 3 Flashcards
When is a model underfitting?
When training and cross validation error are both high.
When is a model overfitting?
Training error is low, cross validation error is high.
What is bias?
Being wrong.
Does not capture relationship between feature variables and outcome variable.
Predictions are consistent but poor model choices lead to wrong predictions
What is variance?
Being unstable.
Model identifies relationship between the features and outcome variable perfectly
Model incorporates random noise besides the underlying function
What is reducible error?
Unavoidable randomness.
Real world data will always contain some randomness in the data points.
Find a model that finds the relationship and avoids incorporating random noise
What causes high bias?
Model misrepresenting the data given missing information
An overly simple model (bias to the simplicity of the model)
ASSOCIATED WITH UNDERFITTING
What causes high variance?
Due to overly complex or poorly fit models (e.g. polynomial order 14 model)
ASSOCIATED WITH OVERFITTING
What is reducible error?
- Tendency to intrinsic uncertainty or randomness.
- It is impossible to perfectly model the majority of real world data. Thus, we have to be comfortable
with some measure of error. - The error is present even in the best model.
Summary of bias-variance tradeoff
Model adjustments that decrease bias often increase variance and vice versa
Similar to a complexity tradeoff
Choosing the right level of complexity
Want a model complex enough to not underfit, but not so complex that it overfits.
What is linear model regularisation(or shrinkage)
Adds an adjustable regularisation strength parameter directly into cost function
Adds a penalty proportional to the size of the estimated model parameter
When it is large, stronger parameters are penalised.
What does more regularisation do
Introduces a simpler model or more bias
Less makes it more complex and increases variance.
If the model overfits (variance is too high), regularisation can improve generalisation error and reduce
variance.
What are the two regularisation methods
L1 (LASSO)
L2 (Ridge)
What is Ridge and what does it do
Imposes bias on the model and reduces variance by applying the penalty proportionally to the squared coefficient values. The best value for lambda can be selected by cross validation. (features should be scaled)
What is Lasso?
The penalty is applied proportionally to absolute coefficient values.
How does regularisation perform feature selection?
It drives some coefficients towards zero.