C2W2 Optimization algorithms Flashcards

1
Q

Batch

A

Learning through all the data at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mini-batch

A

Splitting the data (the power of 2 - 64, 128, 256, 512, 1024)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Exponentially weighted average

A

Parameter B (beta), values 0.9 (10 values average), 0.98 (30 values average)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bias correction

A

Is needed in Exponentially weighted average to smooth incorrect value in the beginning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Gradient descent with momentum

A

Is needed to slower learning in incorrect dimensions. It’s an optimization algorithm. Input parameter Beta
In decays learning rate for the problematic direction.

Momentum takes past gradients into account to smooth out the steps of gradient descent. It can be applied with batch gradient descent, mini-batch gradient descent or stochastic gradient descent.
You have to tune a momentum hyperparameter 𝛽
and a learning rate 𝛼

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RMSProp

A

Root mean square propagation. Optimization algorithm. Input parameter Beta2. Similar to the momentum optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ADAM

A

Adam optimization algorithm is basically taking momentum and RMSprop, and putting them together.

Relatively low memory requirements (though higher than gradient descent and gradient descent with momentum)
Usually works well even with little tuning of hyperparameters (except 𝛼)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Best values for Beta1, Beta2

A

Beta1 = 0.9, Beta2 = 0.999

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Learning rate decay

A

Helps to reduce noise when gradient descent coming to the optimum, by reducing learning rate as epoch increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Local optimum

A

Mostly not applicable to the many-dimensional spaces, but may stumble upon plateaus, optimizations algorithm help to overcome this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Fixed interval scheduling

A

Decaying learning rate every few steps (epochs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Three important optimization techniques

A

Apply three different optimization methods to your models

Build mini-batches for your training set

Use learning rate decay scheduling to speed up your training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

On which phase do optimisation algorithms(ADAM, RMSProp, Momentum) work?

A

The work during backward propagation by modifying the gradients “update” routine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly