Optimisation Flashcards
What is the gradient of the error?
What is the error surface in weight space?
A function of the error of a fixed training set given each setting of weights
Where does the gradient of the error point?
The steepest error descent in weight space
Whats the 2 common inputs to a numerical optimization algorithm?
- A procedure that computes E(w)
- A procedure that computes the partial derivative for each weight
How does gradient desent work?
- Calculate the gradient of the error
- Move in that direction by a fixed step (eta)
What is step size in gradient descent aka?
Learning rate
What happens if the step size in gradient desent is too small?
To slow to optimize
What happens if the step size in gradient desent is to large?
Instability (jumps over the minimum region)
What happens when the next step increases the error rate in “bold driver” gradient descent?
Dont step, agressively reduce learning rate (0.5 eta)
What happens when the next step decreases the error rate in “bold driver” gradient descent?
Cautiously increase learning rate (1.01 eta)
What is batch learning?
Uses all the instances in the training set, updating the weights using
What is online learning?
Adapt weights after each instance using
Which (batch/online) has the more powerfull optimization methods?
batch
Which (batch/online) is easier to analyze?
batch
Which (batch/online) is more feasible for large datasets?
online