C2W3 Hyperparameters Tuning Flashcards
Tuning order
- Learning rate
- Beta(momentum term), mini batch size, hidden units
- Number of layers, learning rate decay
- ADAM parameters (beta1 0.9, beta2 0.999, epsilon 10-8)
How to tune learning rate
Use logarithmic scaling (10^-4, … 10^0)
Pandas vs caviar approach to training model
Pandas: 1 model at the time, caviar: multiple models at the time
Batch normalisation
Normalising values deep in hidden layers, normalise mean and variance of inner Z.
This eliminates usage of B because of mean computing, instead Beta is used
Batch norm at test time
Save M and Alpha for each mini batch (when train).
Compute exponential weighted average of them. (Estimated)
At test use it for scaling hidden units values when evaluating test example
Soft Max regression
Soft Max layer is the latest layer in NN, with number of units equal to number of classes.
You apply softmax activation function to Zs of this layer
Deep learning frameworks
Caffe/Caffe2
CNTK
DL4J
Keras
Lasagne
mxnet
PaddlePaddle
TesorFlow
Theano
Torch