Computer Vision II Flashcards
Where is pooling located?
In the architecture block
What’s pooling?
It’s a way to reduce the feature map size, instead of applying a kernel with learnable weightsthe average or maximum of the values at the kernel position is taken
What are the 2 pooling layer types?
Two pooling layer types: Either specify the kernel size
(classical pooling) or the desired output size (adaptive
pooling)
No learnable weights are involved, and thus, the kernel size
can be adapted per batch
What are Bottleneck layers?
Convolutions with a kernel size of 1 × 1 are cheaper than those with larger kernel sizes
– bottleneck layers make use of this by first, reducing the number of channels, then, applying the
computationally more expensive convolution, and finally, increasing the number of channels to the
original size again
What’s momentum?
In the loss-landscape those small bumps are local minima that may be far away from the optimum
→ Momentum is desired for the weight optimization steps as well
Does 1cycle training schedules the momentum parameter as well?
Yes
What’s the formula for momentum?
mt =β · mt–1 + (1 – β) · gt
θt =θt–1 – γ · mt
g = gradient
What’s RMSProp? What’s the formula?
Rooted Mean Square Propagation. Its an optimizer that adapts the learning rate per weight
vt =α · vt–1 + (1 – α) · g^2
θt =θt–1 – γ ·gt/√(vt + ε)
What’s ADAM?
Adaptive Moment Estimation
It combines the ideas of momentum and RSMProp in an algorithm
Which of the following statements is true about ResNets?
1. In the original ResNet the input to an operation is skipped forward and concatenated with the output for every network operation.
2. More layers in a network make the model more powerful and therefore always lead to improved performance.
3. In the two loss landscapes shown on slide 16, it is visible that ResNets cause steeper edges that make the optimizer “roll down the hill” faster.
4. Pooling is not allowed in a ResNet-Block as the size of the resulting feature map would not match the size of input feature map.
5. If the number of input channels does not equal the desired number of output channels a true identity path is never possible.
5
Which of the following statements is true about ResNets? (Multiple Choice)
1. A convolution with a kernel size of 1 x 1 would not make sense in the stem of a state of the are ResNet, as these convolutions do not reduce the size of the feature map.
2. The reduction in number of operations from a 9 x 9 kernel to a 3 x 3 kernel is proportionally the same as from a 3 x 3 kernel to a 1 x 1 kernel if one disregards the bias related computations.
3. One reason for the effectiveness of ResNets is that the input is kept close to the network output via the skip connections.
4. Bottleneck layers do not necessarily have fewer kernels than plain ResNet layers
5. The plus sign in the fifth line of the code on slide 18 is not the typical ResNet addition of a skip connection to the output.
All of them
Which of the following statements is true? (Multiple Choice)
1. If the value of a particular parameter has changed a lot in the last few updates, this means that active learning is taking place and the learning rate is most likely in a favourable range for that parameter.
2. With momentum the optimizer cannot get stuck in local optima.
3. When preprocessing the dataset, the image size for batch_tfms must be less than or equal to the image size for item_tfms.
4. The momentum parameter is a hyperparameter and its optimal value can be different for every task similar to the learning rate.
5. At the beginning of the training (due to the high stochasticity of the initialisation) and at the end of the training (to overcome the last bumps) a high momentum is desired.
3,4,5
What is the primary purpose of using skip connections in ResNet architectures?
A) To increase the model’s depth
B) To reduce the model’s computational cost
C) To bring the input closer to the output and smooth the loss function
D) To perform data augmentation
Answer: C) To bring the input closer to the output and smooth the loss function
Which optimizer is known for adapting the learning rate for each parameter individually?
A) SGD
B) Adam
C) RMSprop
D) Adagrad
Answer: B) Adam
Given a convolutional layer output of size 16x16, apply a 2x2 max-pooling layer with a stride of 2. What is the resulting output size?
8x8