Neural Network Basics 2 Flashcards

Question 1

Q

What is data augmentation?

Answer

A

Technique used to artificially increase the size and diversity of a training dataset by applying transformations to the existing data. This helps improve the model’s performance, generalization, and robustness without collecting new data.

Question 2

Q

What is softmax?

Answer

A

An activation function that is a “softer” version of the maximum function. Small differences are amplified by taking the exponential. Multi-category equivalent of the sigmoid function

Question 3

Q

Types of augmentation?

Answer

A

For images: Rotations, flips, crops, color jitter, etc.; For text: Synonym replacement, back-translation, etc.; For time-series: Time-warping, jittering, etc.

Question 4

Q

Why use data augmentation?

Answer

A

To tackle overfitting on the specific examples from the dataset, as usually you run the dataset multiple times through the training loop

Question 5

Q

What is cross-entropy loss?

Answer

A

The combination of the softmax and the negative log likelihood. Taking the log of the softmax leads to more focus on incorrect predictions

Question 6

Q

What is log likelihood?

Answer

A

Measure of how similar two distributions p and q are (p: ground truth, q: class probability predictions)

Question 7

Q

What is mini-batch?

Answer

A

Use a set of items (“batch size”), e.g. 16 or 32; Usually as many as will fit at once (incl. parameters and gradient) onto the GPU

Question 8

Q

gradient descent with connection to mini-batch

Answer

A

Stochastic gradient descent – Iterations are usually much faster

Question 9

Q

Where mini-batches are defined?

Answer

A

Using a DataLoader (e.g. bs=64 in ImageDataLoaders.from_name_func)

Question 10

Q

Code for learning rate finder

Answer

A

learn.lr_find()

Question 11

Q

Methodology of learning rate finder

Answer

A

Start with a very small learning rate and train the model for one mini-batch
– increase the learning rate by a certain percentage (e.g. double it) and train the model for another mini-batch
– repeat this until the loss is worsening continuously
– select the greatest learning rate for which the loss was clearly decreasing

Question 12

Q

Techniques to improve model perfomance

Answer

A

Learning rate, training duration, model depth, numeric precision

Question 13

Q

How default approach (learn.fine_tune()) functions?

Answer

A

Freeze the weights of the
pretrained layers for one epoch and then unfreeze all layers and train all for the remaining epochs. Unfreezing can be done manually with the unfreeze() method

Question 14

Q

What is discriminative learning rate?

Answer

A

Approach is to train higher layers with a higher learning rate than lower layers

Question 15

Q

How to set learning rate schedule?

Answer

A

learn.recorder.plot_sched()

Question 16

Q

How to select the number of epochs?

Answer

Study These Flashcards

A

At first, pick a number of epochs that is within the time budget
– Then, plot the training and validation loss (the loss after each training step on the training and validation data)
learn.recorder.plot_loss()
– If the validation loss is still decreasing one can train longer for even better performance
– In the case of a validation loss that is increasing, the model is overfitting and it should be retrained with a smaller number of epochs (the
number of epochs where the validation loss was lowest)

Question 17

Q

Concept of deeper architecture?

Answer

Study These Flashcards

A

The model used so far for transfer learning has several variants varying in depth (18-, 34-, 50-, 101-,152-layers)
larger model is generally more powerful but also more prone to overfitting
With larger models more GPU RAM is required and to avoid an out-of-memory error a smaller batch size may be necessary
Furthermore, training is slower with larger models
Generally, simpler tasks are better solved with simpler model

Neural Network Basics 2 Flashcards

(17 cards)