SGD Flashcards by Unknown Unknown

What optimizer for non linear function of θ?

Gradient based optimizer

How well did you know this?

Not at all

Perfectly

How to compute the gradient of a NN?

compute the partial derivative of the loss with respect to all parameters θk (i.e., the weights and the biases of all layers):

How well did you know this?

Not at all

Perfectly

What does Splitting the training set into B minibatches do?

reduces the computation cost of one gradient by a factor of B
increases the standard deviation on the gradient estimate by a factor of √B only.

More iterations, but fewer epochs (hence smaller total
computation cost).

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly