Computer Vision I Flashcards
Where do convolutions occur?
In the architecture block
What’s a feature in the context of feature engineering?
A feature is a transformation of data, aggregating information that is helpful for a particular task
What’s a kernel and what’s their relation with kernels?
A kernel is a (small) tensor with weights that are multiplied with the values of an input tensor at the
position of the kernel, the sum of those values and a potential bias is the output
A convolution operation applies a kernel across an image
What’s padding in the context of convolutions?
When the output feature map should have the same size as the input tensor, one can apply padding. This adds a virtual number of rows and columns in the height and width dimension filled with zeros
What’s stride in the context of convolutions? What’s the formula to calculate the size of the output feature map?
Reducing the output feature map is possible by increasing the stride of the convolution.
A higher stride means a higher stepsize between kernel positions.
Formula: ((n + 2 ∗ pad – ks)/stride) + 1
n=number of pixels
How many channels does a color image have?
3 (red, green, blue)
What’s the convolution operation in Pytorch? What are the required arguments?
F.conv2d
Req:
input tensor of shape (batch_size, in_channels, iH, iW) and the weights vector of shape (out_channels, in_channels, ks, ks)
– batch_size: The number of samples in the minibatch represented by the input tensor
– in_channels: The number of channels of the input feature map
– out_channels: The number of kernels that will be applied
– iH and iW: The height and width of one input sample
– ks and ks: The height and width of the kernels (here only square kernel sizes are considered)
- Important optional: bias, stride, padding
What are the 3 components or requirements of 1cycle training?
What are the 2 phases of learning rate during it?
Training may diverge if starting with a high learning rate
– The final learning steps should be small such that the minimum is not skipped
– In the middle of the training phase a high learning rate is preferred
– Faster convergence due to larger stepsizes
– Less likely to get trapped because sharp local optima are skipped
Learning rate schedule with two phases:
– Warmup phase: the learning rate is increased gradually
– Annealing phase: the learning rate is decreased again
What’s batch normalization?
What are the 2 formulas related to it?
Normalizes the distribution of activations per channel in one batch.
Behavior during inference is different than during training: During validation the running statistics of the
activations caused by the training data is used rather than the statistics of the activations of the batch
Batch normalization in tendency leads to better generalization most likely due to the extra randomness
x = (x – μ)/(√(σ^2 + ε))
y = γ · x + β
Which of the following statements is true about kernels in CNNs? (Multiple Choice)
1. A stride of one decreases the size of the feature map by one in every dimension.
2. The size of the output feature map is independent of the number of input channels, but dependent on the kernel size.
3. If a convolution with 3 x 3 kernels is applied on a color image there are 27 weights per kernel.
4. Kernels enable weight sharing as all weights of a kernel are the same.
5. The size of the output feature map can be larger than the size of the input feature map if padding is sufficiently high.
2,3,5
Which of the following statements is true about the learning behaviour of neural networks? (Multiple Choice)
1. When using 1cycle training, we need the learning rate finder to get an idea of the optimal learning rate.
2. In general, the standard deviation of activations should be as high as possible to allow the model to discriminate between different inputs.
3. Resetting the network during training helps the network to learn different patterns.
4. Only if there are no activations close to zero in the last layer has the network learned something useful.
1
Which of the following statements is true about batch normalization? (Multiple Choice)
1. Even with batch normalization the model can overfit.
2. Batch normalization means that all input values of a batch are normalized to have a mean of zero and a standard deviation of one.
3. Near zero activations can be fixed with batch normalization.
4. If we only have a width dimension and no height dimension, or vice versa, we compute the batch normalization also over the channel dimension instead.
5. During inference, we would overestimate the performance of a model if we would use the actual statistics of the current batch, because the model then would be using more information about the test data which it is not allowed to use.
1,3
Consider the following “image”:
0.83 0.80 0.41
0.76 0.03 0.43
0.13 0.20 0.27
and the following kernel:
0.11 0.35 0.27
0.85 0.23 0.12
0.55 0.90 0.71
Calculate the output feature map of the first convolutional layer if no padding, no activation, a bias of -6 and a stride of one is employed. Round to two decimal places and enter your solution in the following blank “image”. Note that the calculated output value should be placed in the output at the same position where the kernel was applied in the input. If there are “pixels” where the kernel was not applied, the corresponding output positions should be marked with -1.0:
-1 -1 -1
-1 -4.36 -1
-1 -1 -1
Give a 2 x 2 kernel that produces the highest possible activations when placed on a vertical edge with increasing color values from left to right and no activation when applied to equal values. Assume that the weights can only be in the range from -1 to 1.
-1 1
-1 1
What is the primary advantage of using convolutional neural networks (CNNs) for image classification tasks?
A) They reduce the number of parameters compared to fully connected networks.
B) They are less computationally intensive than other neural network types.
C) They are specifically designed to handle sequential data.
D) They perform better with tabular data.
Answer: A) They reduce the number of parameters compared to fully connected networks.