Supervised learning and Convolutional Networks Flashcards

1
Q

What does SGD stand for? Why do we use this approach and not a full approach? Mention at least 3 variants of SGD

A
SGD = Stochastic gradient descent
Calculating gradients in every direction is too computationally expensive / not necessary (exploration).
1. SGD with momentum
2. RMSprop
3. Adam
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mention three strategies for architecture search

A
  1. Random search
  2. Genetic algorithms
  3. Reinforcement learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mention two learned neural architecture search networks and explain their difference.

A

NAS1 & NAS2.
NAS1 has a fixed layering form (conv2d, batchnorm, relu) with parameters: conv2d params, inputs to layer (skip connections).
NAS2 has a fixed overall architecture (CIFAR10/ImageNet), but wants to learn the cell structure of a “normal” and “reduction” cell.
The two basically (ish) handles and assumes the inverse of one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is dilation rate?

A

How large the gaps are between elements in a feature map on which we apply a convolution filter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is another word for L2 regularization and why do we use it?

A

Least squared. We use it to minimize the complexity of the parameters in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mention ways to augment text and image data

A

Image: Mirroring, random crop, scale, aspect ratio, lightning
Text: Synonym insert, back-translation (google translate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly