Optimization Flashcards

1
Q

What are the hyperparameters in a gradient descent?

A

Weight initialization method
Number of steps before the algo stops
Learning rate per iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is batch gradient descent?

A

Batch gradient descent updates the model after the loss for each example in the training set is computed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is stochastic gradient descent?

A

Approximate sum of loss function using a minibatch of examples 32/64/128 common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What new hypermeters arising from SGD?

A

Batch size and data sampling. Batch size does not matter too much. Make it as big as it can fit your hardware.

Data sampling : Draws from random. Does not matter too much as well esp for computer vision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is high condition number?

A

High condition number usually means that the matrix is almost non-invertible. For SGD, it may lead to jitter and slow progress.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is problem with SGD?

A

It is stuck at local minimum or saddle point in which gradient is zero. Saddle point in which in some direction it is increasing and another direction it is decreasing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What other problem with SGD?

A

Gradient comes from minibatches so they can be noisy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly