Lesson 6 - Learning & Optimization Flashcards

1
Q

Which optimizations can you do prior-training?

A
  • data augmentation
  • input normalization
  • Xavier/Glorot initialization of weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which optimizations can you do during training?

A
  • Dropout
  • Batch Normalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which optimization can you do when computing the loss?

A
  • training with weighted examples
  • focal loss: training with examples of different complexity
  • triplet loss: learning representations by comparison
  • using multiple loss functions: MinMaxCAM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we optimize the training procedure (while searching for the best solution)

A

By having a variable learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is input normalization?

A

It is a prior-training optimization

  • remove the mean image
  • standardize the input (dividing by standard deviation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To what problem is the Xavier/Glorot initialization a solution?

A

When initializing the weights of the network, the common practice was to initialize randomly from a normal distribution.
The problem: large variance - var(z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What was the Xavier/Glorot solution?

A

Make the weights smaller by doing var(z) = 1/n
Therefor
weight_i = weight_i x sqrt(1/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What problem does using dropout tackle?

A

The decrease dependence of a given feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is batch normalization?

A

It is an optimization during training technique.

–> Normalize internal activation by considering dataset statistics
–> stochastic optimization - batch-level statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What problem does batch normalization tackle?

A

During training, updates on weights at a later layer should take into account changes at earlier layers (covariance shift)
-> introduce changes in the distribution of internal activations
-> requires careful initialization and a small learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the benefits of using batch normalization?

A
  • Less sensitivity to initialization
  • Allows using larger learning rates (faster training)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What could be a potential problem/weakness with the gradient descent as how we have seen it so far? And how do we tackle it?

A

If 80% of the examples are from one class then the model will learn the important features of that class, this is because the update process of the weights is dominated by the majority of examples

Tackle this by having weighted examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Focal Loss do?

A
  • down-weights the loss from well-classified examples
  • focusses training on sparse set of hard examples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What problem does focal loss tackle?

A

When the dataset is balanced (so 50-50 for example) but some class has more difficult features to learn (more details, more small/fine points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Where could focal loss be usefull?

A
  • dense prediction tasks
  • in the presence of outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does focal loss work?

A

It is an extra parameter that will increase the loss for examples that are harder to classify, and therefor forcing the model to train on those examples.

It is based on the probability that the model guessed the label correctly. The higher that value, the less influence this focal loss has. So for very uncertain examples, the loss is high and the model will be pushed on them

17
Q

What is Triplet Loss?

A
  • given three examples: Archor, Positive, Negative
  • learn a representation that distance(positive, anchor) < distance(negative, anchor)
18
Q

How is triplet loss different?

A

With normal loss we compare prediction to ground truth (original label)

With triplet loss, we use three examples, and compare distance.

Anchor and positive should share the same class

19
Q

What is the idea behind using multiple loss functions?

A

Object localization

–> regularize a high-performing classifier to enable localization

20
Q

Why would we opt to use a variable learning rate? (Annealing)

A

As training progresses, taken steps might be to large to reach the optimum (when using a fixed learning rate)

21
Q

In self-supervised learning, we have the problem that data annotation is expensive. What could be a solution to this?

A

Supervise using labels generated from data (without manual annotation)