C2W3 Hyperparameters Tuning Flashcards

1
Q

Tuning order

A
  1. Learning rate
  2. Beta(momentum term), mini batch size, hidden units
  3. Number of layers, learning rate decay
  4. ADAM parameters (beta1 0.9, beta2 0.999, epsilon 10-8)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to tune learning rate

A

Use logarithmic scaling (10^-4, … 10^0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pandas vs caviar approach to training model

A

Pandas: 1 model at the time, caviar: multiple models at the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Batch normalisation

A

Normalising values deep in hidden layers, normalise mean and variance of inner Z.
This eliminates usage of B because of mean computing, instead Beta is used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Batch norm at test time

A

Save M and Alpha for each mini batch (when train).
Compute exponential weighted average of them. (Estimated)

At test use it for scaling hidden units values when evaluating test example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Soft Max regression

A

Soft Max layer is the latest layer in NN, with number of units equal to number of classes.
You apply softmax activation function to Zs of this layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Deep learning frameworks

A

Caffe/Caffe2
CNTK
DL4J
Keras
Lasagne
mxnet
PaddlePaddle
TesorFlow
Theano
Torch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly