Neural Network Basics 2 Flashcards
What is data augmentation?
Technique used to artificially increase the size and diversity of a training dataset by applying transformations to the existing data. This helps improve the model’s performance, generalization, and robustness without collecting new data.
What is softmax?
An activation function that is a “softer” version of the maximum function. Small differences are amplified by taking the exponential. Multi-category equivalent of the sigmoid function
Types of augmentation?
For images: Rotations, flips, crops, color jitter, etc.; For text: Synonym replacement, back-translation, etc.; For time-series: Time-warping, jittering, etc.
Why use data augmentation?
To tackle overfitting on the specific examples from the dataset, as usually you run the dataset multiple times through the training loop
What is cross-entropy loss?
The combination of the softmax and the negative log likelihood. Taking the log of the softmax leads to more focus on incorrect predictions
What is log likelihood?
Measure of how similar two distributions p and q are (p: ground truth, q: class probability predictions)
What is mini-batch?
Use a set of items (“batch size”), e.g. 16 or 32; Usually as many as will fit at once (incl. parameters and gradient) onto the GPU
gradient descent with connection to mini-batch
Stochastic gradient descent – Iterations are usually much faster
Where mini-batches are defined?
Using a DataLoader (e.g. bs=64 in ImageDataLoaders.from_name_func)
Code for learning rate finder
learn.lr_find()
Methodology of learning rate finder
- Start with a very small learning rate and train the model for one mini-batch
– increase the learning rate by a certain percentage (e.g. double it) and train the model for another mini-batch
– repeat this until the loss is worsening continuously
– select the greatest learning rate for which the loss was clearly decreasing
Techniques to improve model perfomance
Learning rate, training duration, model depth, numeric precision
How default approach (learn.fine_tune()) functions?
Freeze the weights of the
pretrained layers for one epoch and then unfreeze all layers and train all for the remaining epochs. Unfreezing can be done manually with the unfreeze() method
What is discriminative learning rate?
Approach is to train higher layers with a higher learning rate than lower layers
How to set learning rate schedule?
learn.recorder.plot_sched()
How to select the number of epochs?
At first, pick a number of epochs that is within the time budget
– Then, plot the training and validation loss (the loss after each training step on the training and validation data)
learn.recorder.plot_loss()
– If the validation loss is still decreasing one can train longer for even better performance
– In the case of a validation loss that is increasing, the model is overfitting and it should be retrained with a smaller number of epochs (the
number of epochs where the validation loss was lowest)
Concept of deeper architecture?
- The model used so far for transfer learning has several variants varying in depth (18-, 34-, 50-, 101-,152-layers)
- larger model is generally more powerful but also more prone to overfitting
- With larger models more GPU RAM is required and to avoid an out-of-memory error a smaller batch size may be necessary
- Furthermore, training is slower with larger models
- Generally, simpler tasks are better solved with simpler model