Regularisation Flashcards
Regularisation
Try to make sure, that the model does not overfit to the training data - i.e. let’s not don’t trust the training data too much. The more iterations, the more we fit the model to the training data, minimising loss.
Regularisation does not depend on the features. It looks over the model’s weights and tries to keep them overall small.
Why regularisation?
We try to generalise. The model does not need to be perfect for the training data - but should be useful for unseen data.
A simple analogy for overfitting
If you learn english by speaking only to a teenager. You’ll pick up slang so good, that you might end up not being able to speak english with any other person. Regularisation helps to stop learning - to generally be able to speak english.
How to do regularisation
We still minimise the loss, but we add penalty for a complex model. Loss(data|model) + complexity(model)
A strategy to model complexity: complexity(model)
One popular strategy is to try to prefer smaller weights, that is make the parameters as small we can get away with while still getting the training examples right.
L2 regularisation
also called ridge regularisation
sum of the squared weights
(for linear models)
penalise big weights
weights should be centred around zero
weights should be normally distributed
L(w|D) + lamda*sum(square(w))
(lamda a weight indicating, how much we care about)