DL CNNs & RNNs Flashcards
How to calculate number of parameters (weights and biases) in a CNN?
((filter width * filter height) * (num of old channels/filters) + 1 for bias) * num of new filters
What is dropout?
Randomly setting a fraction of input units to 0 at each update during training time - helps prevent overfitting
How do we combat exploding and vanishing gradients?
- Normaliztion of inputs + Careful initialization of weights
- Regularization
- Gradient clipping
Tips for Choosing Initial weights
- Never set to all zero
- Try somewhere between -0.2 and 0.2 (or fanin)
- Biases are often initialized to 0.01 or similar small value
What is regularization?
“Regularization is any modification we make to a
learning algorithm that is intended to reduce its
generalization error but not its training error.”
Goodfellow et al, 2016
Regularization methods for regression
- Lasoo - L1 encourages sparseness in weight matrices
- ridge - L2 (weight decay/parameter shrinkage)
- elastic net - combines lasoo and ridge
Inverted dropout
During training randomly drop out units according
to a dropout probability at each training epoch
How does dropout work?
Spread out weights - cannot rely on any one input too much
Disadv. of dropout
Introduces another hyperparameter - dropout probability - often one for each layer
What’s another type of regularization apart from inverted dropout?
- Dataset augmentation
- Synthesize examples by flipping, rotating, cropping, distorting
- makes dataset more robust
What is early stopping?
Allowing a model to overfit and then rollback to the point at which the error curve on the training and test sets begin to diverge
1:1 RNN
Vanilla network without RNN
Image classification
1:M RNN
Image captionning
M:1 RNN
Sentiment analysis
M:M RNN
Machine translation