DL-02 - Improving DNN Flashcards by Rikard Donnelly

DL-02 - Improving DNN

What are 3 commonly used DL approaches to avoid overfitting? (3)

regularization
dropout
early stopping

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What are two common regularization techniques?

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is another name for L1 regularization?

Lasso

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is another name for Lasso regularization?

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is another name for L2 regularization?

Ridge

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is another name for Ridge regularization?

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What are the basic types of learning rate decay? (5)

Common method
Exponential
Epoch number based
Mini-batch number based
Discrete staircase

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the formula for “common method” learning rate decay?

(See image)

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

“Common method” learning rate decay

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the formula for exponential learning rate decay?

(See image)

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

Exponential learning rate decay

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the formula for “epoch number based” learning rate decay?

(See image)

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

“epoch number based” learning rate decay

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the formula for “mini-batch number based” learning rate decay?

(See image; should say “mini batch number”)

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image; should say “mini batch number”)

mini-batch number based

How well did you know this?

Not at all

Perfectly

DL-02 - Improving DNN

What is the name of this method of learning rate decay? (See image)

Study These Flashcards

Discrete staircase.

DL-02 - Improving DNN

What does the learning rate graph look like with a “discrete staircase” approach?

Study These Flashcards

(See image)

DL-02 - Improving DNN

What is the role of momentum in neural network training?

Study These Flashcards

Momentum reduces oscillations and speeds up convergence for a smoother learning process.

DL-02 - Improving DNN

How does AdaGrad help improve the performance of a neural network?

Study These Flashcards

By adaptively scaling learning rates based on accumulated past gradients for each parameter, leading to faster convergence.

DL-02 - Improving DNN

Which optimization algorithm is AdaDelta built on?

Study These Flashcards

AdaGrad

DL-02 - Improving DNN

What’s the difference between AdaDelta and AdaGrad?

Study These Flashcards

AdaDelta improves AdaGrad by addressing its diminishing learning rates issue through using a moving average of squared gradients instead of historical accumulated gradients.

DL-02 - Improving DNN

How does RMSprop work?

Study These Flashcards

RMSprop works by adapting the learning rate for each weight parameter using a running average of the magnitude of recent gradients.

DL-02 - Improving DNN

How does the Adam optimizer work?

Study These Flashcards

The Adam optimizer works by adaptively adjusting learning rates for each parameter using both moment estimates and exponentially-averaged past gradients.

DL-02 - Improving DNN

What are some benefits of using ADAM over other optimizers? (AESN)

Study These Flashcards

Adaptive learning rates
efficient computation
suitable for sparse data
reduced noise in parameter updates

# DL-02 - Improving DNN Which optimizer performs the best on average?

ADAM.

# DL-02 - Improving DNN What is a requirement for using batch normalization?

Decently large mini-batches. Small batches make it unstable.

# DL-02 - Improving DNN How does batch normalization act as a regularizer?

It adds noise during training.

# DL-02 - Improving DNN How does batch and layer normalization differ?

- BN normalizes batches, e.g. same params for a whole image. - LN normalizes layers/features separately, but also inside the batch.

# DL-02 - Improving DNN Is batch normalization stable for small mini-batch sizes?

No, need large batch sizes.

# DL-02 - Improving DNN Is layer normalization stable for small mini-batch sizes?

Yes, it's not dependent on batch size.

# DL-02 - Improving DNN What are the 3 types of hyper parameter tuning approaches? (3)

- Manual - Brute force (Grid search, random search etc.) - Meta model (Machine learning, e.g. Optuna)

# DL-02 - Improving DNN What is HPO short for?

Hyperparameter optimization

# DL-02 - Improving DNN What is a surrogate model in therms of HPO?

A model trained on the hyper parameters, where the model output is model quality.

# DL-02 - Improving DNN What is a requirement for choosing what model to use for a surrogate model?

No way to know gradient -> model should use gradient free optimization.

# DL-02 - Improving DNN When should you retune hyperparameters?

Occasionally/regularly, especially with a change in the data or problem to solve.

# DL-02 - Improving DNN What is NAG short for?

Nesterov accelerated gradient

# DL-02 - Improving DNN What is the purpose of Nesterov momentum in neural network training?

Nesterov momentum accelerates training by computing future position gradients, reducing oscillations and improving convergence.

# DL-02 - Improving DNN What is RMSProp short for?

Root Mean Square Propagation

DL-02 - Improving DNN Flashcards

(38 cards)