DL-02 - Improving DNN Flashcards

1
Q

DL-02 - Improving DNN

What are 3 commonly used DL approaches to avoid overfitting? (3)

A
  • regularization
  • dropout
  • early stopping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DL-02 - Improving DNN

What are two common regularization techniques?

A
  • L1
  • L2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DL-02 - Improving DNN

What is another name for L1 regularization?

A

Lasso

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DL-02 - Improving DNN

What is another name for Lasso regularization?

A

L1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DL-02 - Improving DNN

What is another name for L2 regularization?

A

Ridge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DL-02 - Improving DNN

What is another name for Ridge regularization?

A

L2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DL-02 - Improving DNN

What are the basic types of learning rate decay? (5)

A
  • Common method
  • Exponential
  • Epoch number based
  • Mini-batch number based
  • Discrete staircase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DL-02 - Improving DNN

What is the formula for “common method” learning rate decay?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

A

“Common method” learning rate decay

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DL-02 - Improving DNN

What is the formula for exponential learning rate decay?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

A

Exponential learning rate decay

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DL-02 - Improving DNN

What is the formula for “epoch number based” learning rate decay?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image)

A

“epoch number based” learning rate decay

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DL-02 - Improving DNN

What is the formula for “mini-batch number based” learning rate decay?

A

(See image; should say “mini batch number”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DL-02 - Improving DNN

What is the name for this method of learning rate decay? (See image; should say “mini batch number”)

A

mini-batch number based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DL-02 - Improving DNN

What is the name of this method of learning rate decay? (See image)

A

Discrete staircase.

17
Q

DL-02 - Improving DNN

What does the learning rate graph look like with a “discrete staircase” approach?

A

(See image)

18
Q

DL-02 - Improving DNN

What is the role of momentum in neural network training?

A

Momentum reduces oscillations and speeds up convergence for a smoother learning process.

19
Q

DL-02 - Improving DNN

How does AdaGrad help improve the performance of a neural network?

A

By adaptively scaling learning rates based on accumulated past gradients for each parameter, leading to faster convergence.

20
Q

DL-02 - Improving DNN

Which optimization algorithm is AdaDelta built on?

A

AdaGrad

21
Q

DL-02 - Improving DNN

What’s the difference between AdaDelta and AdaGrad?

A

AdaDelta improves AdaGrad by addressing its diminishing learning rates issue through using a moving average of squared gradients instead of historical accumulated gradients.

22
Q

DL-02 - Improving DNN

How does RMSprop work?

A

RMSprop works by adapting the learning rate for each weight parameter using a running average of the magnitude of recent gradients.

23
Q

DL-02 - Improving DNN

How does the Adam optimizer work?

A

The Adam optimizer works by adaptively adjusting learning rates for each parameter using both moment estimates and exponentially-averaged past gradients.

24
Q

DL-02 - Improving DNN

What are some benefits of using ADAM over other optimizers? (AESN)

A
  • Adaptive learning rates
  • efficient computation
  • suitable for sparse data
  • reduced noise in parameter updates
25
Q

DL-02 - Improving DNN

Which optimizer performs the best on average?

A

ADAM.

26
Q

DL-02 - Improving DNN

What is a requirement for using batch normalization?

A

Decently large mini-batches. Small batches make it unstable.

27
Q

DL-02 - Improving DNN

How does batch normalization act as a regularizer?

A

It adds noise during training.

28
Q

DL-02 - Improving DNN

How does batch and layer normalization differ?

A
  • BN normalizes batches, e.g. same params for a whole image.
  • LN normalizes layers/features separately, but also inside the batch.
29
Q

DL-02 - Improving DNN

Is batch normalization stable for small mini-batch sizes?

A

No, need large batch sizes.

30
Q

DL-02 - Improving DNN

Is layer normalization stable for small mini-batch sizes?

A

Yes, it’s not dependent on batch size.

31
Q

DL-02 - Improving DNN

What are the 3 types of hyper parameter tuning approaches? (3)

A
  • Manual
  • Brute force (Grid search, random search etc.)
  • Meta model (Machine learning, e.g. Optuna)
32
Q

DL-02 - Improving DNN

What is HPO short for?

A

Hyperparameter optimization

33
Q

DL-02 - Improving DNN

What is a surrogate model in therms of HPO?

A

A model trained on the hyper parameters, where the model output is model quality.

34
Q

DL-02 - Improving DNN

What is a requirement for choosing what model to use for a surrogate model?

A

No way to know gradient -> model should use gradient free optimization.

35
Q

DL-02 - Improving DNN

When should you retune hyperparameters?

A

Occasionally/regularly, especially with a change in the data or problem to solve.

36
Q

DL-02 - Improving DNN

What is NAG short for?

A

Nesterov accelerated gradient

37
Q

DL-02 - Improving DNN

What is the purpose of Nesterov momentum in neural network training?

A

Nesterov momentum accelerates training by computing future position gradients, reducing oscillations and improving convergence.

38
Q

DL-02 - Improving DNN

What is RMSProp short for?

A

Root Mean Square Propagation