Optimisation Flashcards
What is the gradient of the error?
What is the error surface in weight space?
A function of the error of a fixed training set given each setting of weights
Where does the gradient of the error point?
The steepest error descent in weight space
Whats the 2 common inputs to a numerical optimization algorithm?
- A procedure that computes E(w)
- A procedure that computes the partial derivative for each weight
How does gradient desent work?
- Calculate the gradient of the error
- Move in that direction by a fixed step (eta)
What is step size in gradient descent aka?
Learning rate
What happens if the step size in gradient desent is too small?
To slow to optimize
What happens if the step size in gradient desent is to large?
Instability (jumps over the minimum region)
What happens when the next step increases the error rate in “bold driver” gradient descent?
Dont step, agressively reduce learning rate (0.5 eta)
What happens when the next step decreases the error rate in “bold driver” gradient descent?
Cautiously increase learning rate (1.01 eta)
What is batch learning?
Uses all the instances in the training set, updating the weights using
What is online learning?
Adapt weights after each instance using
Which (batch/online) has the more powerfull optimization methods?
batch
Which (batch/online) is easier to analyze?
batch
Which (batch/online) is more feasible for large datasets?
online
Which (batch/online) may have the ability to jump over local optima?
online
What type (batch/online) is stochastic gradient ascent?
online
How is a training instance picked in stocastic gradient acsent?
Uniformly random number 1 … n
Whats the problem with using gradient descent on this error surface?
Gradient descent very slow once in shallow valley
What is the definition of momentum?
What is the problem with using gradient descent on this error surface?
Wont find global minima, stuck in local minima
How can try to find a global minima not a local minima?
- Rerun the optimizer with random starting points
- Momentum
Whats the problem with momentum?
Another parameter to pick, even less heuristics to help