Optimisation & Regularisation Flashcards
How can learning be viewed as Optimisation?
- Training
- Model fitting
- Parameter estimation
How to decompose errors into bias and variance?
Error = bias^2 + variance + noise
What is bias?
How well our model can correctly predict the data
What is variance?
How well our model can respond to new data
How to reduce overfitting?
Need to dampen the complexity, smoothing it out.
- Regularisation
- Restricting the degrees of freedom (effective number of parameters) present in our model
- Will sacrifice training error for test error
e. g. in SVMs, the slack variables provide the regularisation
What is L1 Regularisation?
L1 weight regularisation penalises weight values by adding the sum of their absolute values to the error term
L1 regularisation encourages solutions where many parameters are zero
e.g. Lasso algorithm
What is L2 Regularisation?
L2 weight regularisation penalises weight values by adding the sum of their squared values to the error term
L2 regularisation encourages solutions where most parameter values are small.
e.g. Linear Regression
Batch vs Stochastic Gradient Descent
Batch: Evaluation of D occurs for entire data set at each iteration
- Can be slow for large data sets
- Cannot be used in incremental settings
- Guaranteed to converge to the global minimum for convex error surfaces
Stochastic: Update is performed for each training instance
- Order of training instances must be random
- Updates are noisy, value of D will jump all over the place
- “random walk” avoids getting stuck
- Often only requires a small number of iterations through the full data set
How to find the minimum error for regularisation/optimisation?
Gradient Descent to approximate rather than calculate.
Why is it that parameter tuning might lead to overfitting?
-
What is the Gradient Descent method, and why is it important?
Gradient Descent is a mechanism for finding the minimum of a (convex) multivariate function where we can find its partial derivatives.
This is important because it allows us to determine the regression weights which minimise an error function over some training data set.