Optimizers Flashcards
What is optimization?
determines the algorithms we will use to vary our model’s parameters
What is a type of optimization algorithm?
- Gradient decent - slow
- SGD - Stochastic Gradient Descent
What is the difference between the local minimum and global minimum?
How do we extend the GD and SGD?
using momentum
What is momentum?
Is the Alpha of momentum a hyper parameter?
Yes
What is the learning rate?
eta - (greek)
Small enough - so we gently descend through the loss function, instead of oscillating wildly around the minimum and never reaching it or diverging to infinity
Big enough - so we reach the optimization in a rational amount of time
What are learning rate schedules?
Best of both worlds. Small enough and Big enough
it causes the loss function to converge much faster
What are two examples of Adaptive Learning Rate Schedules?
AdaGrad and RMSProp
AdaGrad = Adaptive Gradient Algorithm
RMSProp = Root Mean Square Propagation
Adam = Adaptive moment estimation