Neural Network Basics 2 Flashcards
What is data augmentation?
Technique used to artificially increase the size and diversity of a training dataset by applying transformations to the existing data. This helps improve the model’s performance, generalization, and robustness without collecting new data.
What is softmax?
An activation function that is a “softer” version of the maximum function. Small differences are amplified by taking the exponential. Multi-category equivalent of the sigmoid function
Types of augmentation?
For images: Rotations, flips, crops, color jitter, etc.; For text: Synonym replacement, back-translation, etc.; For time-series: Time-warping, jittering, etc.
Why use data augmentation?
To tackle overfitting on the specific examples from the dataset, as usually you run the dataset multiple times through the training loop
What is cross-entropy loss?
The combination of the softmax and the negative log likelihood. Taking the log of the softmax leads to more focus on incorrect predictions
What is log likelihood?
Measure of how similar two distributions p and q are (p: ground truth, q: class probability predictions)
What is mini-batch?
Use a set of items (“batch size”), e.g. 16 or 32; Usually as many as will fit at once (incl. parameters and gradient) onto the GPU
gradient descent with connection to mini-batch
Stochastic gradient descent – Iterations are usually much faster
Where mini-batches are defined?
Using a DataLoader (e.g. bs=64 in ImageDataLoaders.from_name_func)
Code for learning rate finder
learn.lr_find()
Methodology of learning rate finder
- Start with a very small learning rate and train the model for one mini-batch
– increase the learning rate by a certain percentage (e.g. double it) and train the model for another mini-batch
– repeat this until the loss is worsening continuously
– select the greatest learning rate for which the loss was clearly decreasing
Techniques to improve model perfomance
Learning rate, training duration, model depth, numeric precision
How default approach (learn.fine_tune()) functions?
Freeze the weights of the
pretrained layers for one epoch and then unfreeze all layers and train all for the remaining epochs. Unfreezing can be done manually with the unfreeze() method
What is discriminative learning rate?
Approach is to train higher layers with a higher learning rate than lower layers
How to set learning rate schedule?
learn.recorder.plot_sched()