Supervised learning and Convolutional Networks Flashcards

Question 1

Q

What does SGD stand for? Why do we use this approach and not a full approach? Mention at least 3 variants of SGD

Answer

A

SGD = Stochastic gradient descent
Calculating gradients in every direction is too computationally expensive / not necessary (exploration).
1. SGD with momentum
2. RMSprop
3. Adam

Question 2

Q

Mention three strategies for architecture search

Answer

A

Random search
Genetic algorithms
Reinforcement learning

Question 3

Q

Mention two learned neural architecture search networks and explain their difference.

Answer

A

NAS1 & NAS2.
NAS1 has a fixed layering form (conv2d, batchnorm, relu) with parameters: conv2d params, inputs to layer (skip connections).
NAS2 has a fixed overall architecture (CIFAR10/ImageNet), but wants to learn the cell structure of a “normal” and “reduction” cell.
The two basically (ish) handles and assumes the inverse of one another.

Question 4

Q

What is dilation rate?

Answer

A

How large the gaps are between elements in a feature map on which we apply a convolution filter.

Question 5

Q

What is another word for L2 regularization and why do we use it?

Answer

A

Least squared. We use it to minimize the complexity of the parameters in the model.

Question 6

Q

Mention ways to augment text and image data

Answer

A

Image: Mirroring, random crop, scale, aspect ratio, lightning
Text: Synonym insert, back-translation (google translate)