Optimization Flashcards

Question 1

Q

What are the hyperparameters in a gradient descent?

Answer

A

Weight initialization method
Number of steps before the algo stops
Learning rate per iterations

Question 2

Q

What is batch gradient descent?

Answer

A

Batch gradient descent updates the model after the loss for each example in the training set is computed

Question 3

Q

What is stochastic gradient descent?

Answer

A

Approximate sum of loss function using a minibatch of examples 32/64/128 common

Question 4

Q

What new hypermeters arising from SGD?

Answer

A

Batch size and data sampling. Batch size does not matter too much. Make it as big as it can fit your hardware.

Data sampling : Draws from random. Does not matter too much as well esp for computer vision.

Question 5

Q

What is high condition number?

Answer

A

High condition number usually means that the matrix is almost non-invertible. For SGD, it may lead to jitter and slow progress.

Question 6

Q

What is problem with SGD?

Answer

A

It is stuck at local minimum or saddle point in which gradient is zero. Saddle point in which in some direction it is increasing and another direction it is decreasing.

Question 7

Q

What other problem with SGD?

Answer

A

Gradient comes from minibatches so they can be noisy

Optimization Flashcards

(7 cards)