C2W2 Optimization algorithms Flashcards

Question 1

Q

Batch

Answer

A

Learning through all the data at the same time

Question 2

Q

Mini-batch

Answer

A

Splitting the data (the power of 2 - 64, 128, 256, 512, 1024)

Question 3

Q

Exponentially weighted average

Answer

A

Parameter B (beta), values 0.9 (10 values average), 0.98 (30 values average)

Question 4

Q

Bias correction

Answer

A

Is needed in Exponentially weighted average to smooth incorrect value in the beginning

Question 5

Q

Gradient descent with momentum

Answer

A

Is needed to slower learning in incorrect dimensions. It’s an optimization algorithm. Input parameter Beta
In decays learning rate for the problematic direction.

Momentum takes past gradients into account to smooth out the steps of gradient descent. It can be applied with batch gradient descent, mini-batch gradient descent or stochastic gradient descent.
You have to tune a momentum hyperparameter 𝛽
and a learning rate 𝛼

Question 6

Q

RMSProp

Answer

A

Root mean square propagation. Optimization algorithm. Input parameter Beta2. Similar to the momentum optimization

Question 7

Q

ADAM

Answer

A

Adam optimization algorithm is basically taking momentum and RMSprop, and putting them together.

Relatively low memory requirements (though higher than gradient descent and gradient descent with momentum)
Usually works well even with little tuning of hyperparameters (except 𝛼)

Question 8

Q

Best values for Beta1, Beta2

Answer

A

Beta1 = 0.9, Beta2 = 0.999

Question 9

Q

Learning rate decay

Answer

A

Helps to reduce noise when gradient descent coming to the optimum, by reducing learning rate as epoch increases

Question 10

Q

Local optimum

Answer

A

Mostly not applicable to the many-dimensional spaces, but may stumble upon plateaus, optimizations algorithm help to overcome this.

Question 11

Q

Fixed interval scheduling

Answer

A

Decaying learning rate every few steps (epochs)

Question 12

Q

Three important optimization techniques

Answer

A

Apply three different optimization methods to your models

Build mini-batches for your training set

Use learning rate decay scheduling to speed up your training

Question 13

Q

On which phase do optimisation algorithms(ADAM, RMSProp, Momentum) work?

Answer

A

The work during backward propagation by modifying the gradients “update” routine

C2W2 Optimization algorithms Flashcards

(13 cards)