lecture 7: Over-fitting and bias/variance trade-off Flashcards
review: what is the main aim of regression?
given feature(s) x, we want to predict target y note: x can be 1D or multi-D, y is 1-D
what is the number one rule for train and test sets
they should never overlap, test set should always be unseen data
what is overfitting
fit is very good for the training set but very bad for the test set, usually means the model used is too complex
what is underfitting
the fit is very bad for both the training and test set, usually means the model used is too simple
what are the reasons for overfitting
model is too complex or too many features and not enough training samples
what are the solutions for overfitting
use simpler models(eg. lower order polynomial) or use regularisation
what are the reasons for underfitting
model is too simple or features are not informative enough
what is regularisation
it is an umbrella term that includes methods that force learning algorithms to build less complex models
recall in previous lecture 𝜆
what does adding the regularisation term 𝜆reg(w) do?
encourages w to be small - called weight decay(L2 regularisation), which penalises more complex models
visually, it makes complex models flatter
what does lambda signify?
the trade-off between data-loss and regularisation
what is the difference between bias and variance
low bias represents the predictions being close to the target on average while low variance represents the spread of the predictions being small
in general, very simple models exhibit what bias and variance
high bias and low variance
in general, very complex models exhibit what bias and variance
low bias an high variance
according to the bias variance trade-off theorem, the mse of a new test sample x is given by
test error = bias squared + variance + irreducible noise
Bias(f)² + Var(f) + σ²
Bias = favg(x) - f(x)
Var = E[(f(x) - favg(x))²]