Computer Vision I Flashcards

1
Q

Where do convolutions occur?

A

In the architecture block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s a feature in the context of feature engineering?

A

A feature is a transformation of data, aggregating information that is helpful for a particular task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s a kernel and what’s their relation with kernels?

A

A kernel is a (small) tensor with weights that are multiplied with the values of an input tensor at the
position of the kernel, the sum of those values and a potential bias is the output

A convolution operation applies a kernel across an image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s padding in the context of convolutions?

A

When the output feature map should have the same size as the input tensor, one can apply padding. This adds a virtual number of rows and columns in the height and width dimension filled with zeros

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s stride in the context of convolutions? What’s the formula to calculate the size of the output feature map?

A

Reducing the output feature map is possible by increasing the stride of the convolution.
A higher stride means a higher stepsize between kernel positions.

Formula: ((n + 2 ∗ pad – ks)/stride) + 1
n=number of pixels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many channels does a color image have?

A

3 (red, green, blue)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What’s the convolution operation in Pytorch? What are the required arguments?

A

F.conv2d
Req:
input tensor of shape (batch_size, in_channels, iH, iW) and the weights vector of shape (out_channels, in_channels, ks, ks)

– batch_size: The number of samples in the minibatch represented by the input tensor
– in_channels: The number of channels of the input feature map
– out_channels: The number of kernels that will be applied
– iH and iW: The height and width of one input sample
– ks and ks: The height and width of the kernels (here only square kernel sizes are considered)
- Important optional: bias, stride, padding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 components or requirements of 1cycle training?
What are the 2 phases of learning rate during it?

A

Training may diverge if starting with a high learning rate
– The final learning steps should be small such that the minimum is not skipped
– In the middle of the training phase a high learning rate is preferred
– Faster convergence due to larger stepsizes
– Less likely to get trapped because sharp local optima are skipped

Learning rate schedule with two phases:
– Warmup phase: the learning rate is increased gradually
– Annealing phase: the learning rate is decreased again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s batch normalization?
What are the 2 formulas related to it?

A

Normalizes the distribution of activations per channel in one batch.
Behavior during inference is different than during training: During validation the running statistics of the
activations caused by the training data is used rather than the statistics of the activations of the batch
Batch normalization in tendency leads to better generalization most likely due to the extra randomness
x = (x – μ)/(√(σ^2 + ε))
y = γ · x + β

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following statements is true about kernels in CNNs? (Multiple Choice)
1. A stride of one decreases the size of the feature map by one in every dimension.
2. The size of the output feature map is independent of the number of input channels, but dependent on the kernel size.
3. If a convolution with 3 x 3 kernels is applied on a color image there are 27 weights per kernel.
4. Kernels enable weight sharing as all weights of a kernel are the same.
5. The size of the output feature map can be larger than the size of the input feature map if padding is sufficiently high.

A

2,3,5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following statements is true about the learning behaviour of neural networks? (Multiple Choice)
1. When using 1cycle training, we need the learning rate finder to get an idea of the optimal learning rate.
2. In general, the standard deviation of activations should be as high as possible to allow the model to discriminate between different inputs.
3. Resetting the network during training helps the network to learn different patterns.
4. Only if there are no activations close to zero in the last layer has the network learned something useful.

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following statements is true about batch normalization? (Multiple Choice)
1. Even with batch normalization the model can overfit.
2. Batch normalization means that all input values of a batch are normalized to have a mean of zero and a standard deviation of one.
3. Near zero activations can be fixed with batch normalization.
4. If we only have a width dimension and no height dimension, or vice versa, we compute the batch normalization also over the channel dimension instead.
5. During inference, we would overestimate the performance of a model if we would use the actual statistics of the current batch, because the model then would be using more information about the test data which it is not allowed to use.

A

1,3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Consider the following “image”:

0.83 0.80 0.41
0.76 0.03 0.43
0.13 0.20 0.27
and the following kernel:

0.11 0.35 0.27
0.85 0.23 0.12
0.55 0.90 0.71
Calculate the output feature map of the first convolutional layer if no padding, no activation, a bias of -6 and a stride of one is employed. Round to two decimal places and enter your solution in the following blank “image”. Note that the calculated output value should be placed in the output at the same position where the kernel was applied in the input. If there are “pixels” where the kernel was not applied, the corresponding output positions should be marked with -1.0:

A

-1 -1 -1
-1 -4.36 -1
-1 -1 -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give a 2 x 2 kernel that produces the highest possible activations when placed on a vertical edge with increasing color values from left to right and no activation when applied to equal values. Assume that the weights can only be in the range from -1 to 1.

A

-1 1
-1 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the primary advantage of using convolutional neural networks (CNNs) for image classification tasks?

A) They reduce the number of parameters compared to fully connected networks.
B) They are less computationally intensive than other neural network types.
C) They are specifically designed to handle sequential data.
D) They perform better with tabular data.

A

Answer: A) They reduce the number of parameters compared to fully connected networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following techniques is used to reduce overfitting in CNNs during training?

A) Data augmentation
B) Gradient clipping
C) Early stopping
D) All of the above

A

Answer: D) All of the above

17
Q

Given a convolutional layer with a kernel size of 3x3, stride of 1, and no padding, calculate the output size for an input image of size 32x32.

A

output size is 30x30.

18
Q

How does using convolutional layers benefit the processing of image data compared to fully connected layers?

A

Answer:
Convolutional layers preserve the spatial structure of the image by using local connections and shared weights, which makes them more efficient in terms of parameters and computation. They can capture local patterns and hierarchies in the data, such as edges, textures, and more complex structures, which is crucial for tasks like image classification, object detection, and segmentation.