Neural Network Basics 2 Flashcards

1
Q

What is data augmentation?

A

Technique used to artificially increase the size and diversity of a training dataset by applying transformations to the existing data. This helps improve the model’s performance, generalization, and robustness without collecting new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is softmax?

A

An activation function that is a “softer” version of the maximum function. Small differences are amplified by taking the exponential. Multi-category equivalent of the sigmoid function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of augmentation?

A

For images: Rotations, flips, crops, color jitter, etc.; For text: Synonym replacement, back-translation, etc.; For time-series: Time-warping, jittering, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use data augmentation?

A

To tackle overfitting on the specific examples from the dataset, as usually you run the dataset multiple times through the training loop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is cross-entropy loss?

A

The combination of the softmax and the negative log likelihood. Taking the log of the softmax leads to more focus on incorrect predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is log likelihood?

A

Measure of how similar two distributions p and q are (p: ground truth, q: class probability predictions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is mini-batch?

A

Use a set of items (“batch size”), e.g. 16 or 32; Usually as many as will fit at once (incl. parameters and gradient) onto the GPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

gradient descent with connection to mini-batch

A

Stochastic gradient descent – Iterations are usually much faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Where mini-batches are defined?

A

Using a DataLoader (e.g. bs=64 in ImageDataLoaders.from_name_func)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Code for learning rate finder

A

learn.lr_find()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Methodology of learning rate finder

A
  • Start with a very small learning rate and train the model for one mini-batch
    – increase the learning rate by a certain percentage (e.g. double it) and train the model for another mini-batch
    – repeat this until the loss is worsening continuously
    – select the greatest learning rate for which the loss was clearly decreasing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Techniques to improve model perfomance

A

Learning rate, training duration, model depth, numeric precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How default approach (learn.fine_tune()) functions?

A

Freeze the weights of the
pretrained layers for one epoch and then unfreeze all layers and train all for the remaining epochs. Unfreezing can be done manually with the unfreeze() method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is discriminative learning rate?

A

Approach is to train higher layers with a higher learning rate than lower layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to set learning rate schedule?

A

learn.recorder.plot_sched()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to select the number of epochs?

A

At first, pick a number of epochs that is within the time budget
– Then, plot the training and validation loss (the loss after each training step on the training and validation data)
learn.recorder.plot_loss()
– If the validation loss is still decreasing one can train longer for even better performance
– In the case of a validation loss that is increasing, the model is overfitting and it should be retrained with a smaller number of epochs (the
number of epochs where the validation loss was lowest)

17
Q

Concept of deeper architecture?

A
  • The model used so far for transfer learning has several variants varying in depth (18-, 34-, 50-, 101-,152-layers)
  • larger model is generally more powerful but also more prone to overfitting
  • With larger models more GPU RAM is required and to avoid an out-of-memory error a smaller batch size may be necessary
  • Furthermore, training is slower with larger models
  • Generally, simpler tasks are better solved with simpler model