Optimizers Flashcards by R W

What is optimization?

determines the algorithms we will use to vary our model’s parameters

How well did you know this?

Not at all

Perfectly

What is a type of optimization algorithm?

Gradient decent - slow
SGD - Stochastic Gradient Descent

How well did you know this?

Not at all

Perfectly

What is the difference between the local minimum and global minimum?

How well did you know this?

Not at all

Perfectly

How do we extend the GD and SGD?

using momentum

How well did you know this?

Not at all

Perfectly

What is momentum?

How well did you know this?

Not at all

Perfectly

Is the Alpha of momentum a hyper parameter?

Yes

How well did you know this?

Not at all

Perfectly

What is the learning rate?

eta - (greek)

Small enough - so we gently descend through the loss function, instead of oscillating wildly around the minimum and never reaching it or diverging to infinity

Big enough - so we reach the optimization in a rational amount of time

How well did you know this?

Not at all

Perfectly

What are learning rate schedules?

Best of both worlds. Small enough and Big enough

it causes the loss function to converge much faster

How well did you know this?

Not at all

Perfectly

What are two examples of Adaptive Learning Rate Schedules?

AdaGrad and RMSProp

AdaGrad = Adaptive Gradient Algorithm

RMSProp = Root Mean Square Propagation

Adam = Adaptive moment estimation

How well did you know this?

Not at all

Perfectly

Optimizers Flashcards

(9 cards)