College 4 Flashcards
Name and define 3 properties of convolution.
- sparse connectivity: convolution kernel is much smaller than the input (less connections)
- Parameter sharing:
kernel coefficients are identical for each input location - Equivariant representations:
convolution value covaries with input value (I you shift your input (image) your output is going to be the same with the same shift)
If you have 2d convolution, with an 32x32 input and a 3x3 filter how many parameters do you have to learn? And what will be the size of the feature map
- 9 (3x3)
2. 28 x 28
If you apply 6 filters in convolution how many feature maps do you get?
6
What is the filter size of a 2d convolution for an input with N channels?
(3,3,N)
1,1,2,4
5,6,7,8
3,2,1,0
1,2,3,4
What is the result of applying 2d max pooling with a 2x2 filter and stride: 2?
6,8
3,4
If you have a convolution over a 7 by 7 input
Filter size: 3x3
stride: 1
What is the output size?
5x5
If you have a convolution over a 7 by 7 input
Filter size: 3x3
stride: 2
What is the output size?
3x3
What can be an advantage of maxpooling?
more robustness (to little shifts in the input) / better generalisation
What can be an advantage of increasing strides?
efficiency / space reduction
If you have a convolution over a 8 by 8 input Filter size: 3x3 stride: 3 padding: 2 What is the output size?
4x4
How do you calculate the width and height of the output size
width_out = ((Width_input - filter_width + 2 x padding) / Stride) + 1
height_out = ((Height_input - filter_height + 2 x padding) / Stride) + 1
If you have a convolution over a 5 by 5 input Filter size: 3x3 stride: 2 padding: 1 What is the output size?
3x3
If you have a convolution over a 64 by 64 by 3 input
Filter size: 4x4x3
filters: 32
stride:2
What is the number of feature maps?
What is the output width and height?
What is the number of parameters of the convolutional layer?
output width: 31 output height: 31 number of feature maps: 32 number of parameters: 1568 (4x4x3 x32 + 1 , 1 for bias)
Define: transposed convolution
A specific transformation is not always useful, so a more robust way to upsample is to learn som filters that allow going from a feature map to a larger one.
Apply transposed convolution:
input =
0, 1
2, 3
kernel =
0, 1
2, 3
0, 0, 1
0, 4, 6
4, 12, 9
Name an application of transposed convolution
automatic colorization (encoder-decoder)
what is the size of a filter with 1 by 1 convolution?
1x1
why would you apply 1 by 1 convolution?
the deeper you get into the network the more feature maps you get, if your network get to big you want to reduce the size of the network, you can apply 1x1 convolution to reduce the dimensionality of the feature maps; compressed version
How does 1x1 convolution work?
input feature map with shape (W,H, N
m filters of size (1,1,N) with (m
What is the goal of inception architectures?
Increasing the depth and width of the network while keeping the computational budget constant
How does the naïve version of the inception module work?
The information from the previous layer goes through: 1x1 convolutions 3x3 convolutions 5x5 convolutions 3x3 max pooling
and this information is then concatenated
What was the problem with the naïve version of the inception module
It did not keep the computational budget constant.
What were the improvements made to the naïve version of the inception module?
Before applying the 3x3 and 5x5 convolutions 1x1 convolutions were applied. And after the 3x3 max pooling 1x1 convolutions were applied
What happens when you add more layers (get a deeper network)?
The intuition is that you get more parameters, more powerful network and better performance but that is not always the case. A 20-layer network can outperform a 56-layer network. A layer gives information to the next layer and that layer gives information to the next layer etc. If one layer extracts information that is not very useful the next layer will try to learn something from non-useful information. The later layers get input that has been processed a lot and has probably lost information that was in the original input.
What is the idea of Res-Nets?
Give the network the option to just copy the input if the function F(x) (the information passed through from the previous layer) is not informative (and add this information to the output)
What are attention mechanisms?
Attention mechanisms highlight the most informative features (in an image. This process runs parallel to to the feature extraction and creates a mask to ignore the features that are not important.
What are the main methods for object detection?
- Region proposals - R -CNN (Fast R-CNN and Faster R-CNN)
- You Only Look Once - YOLO
- Single Shot MultiBox Detector - SSD
- RetinaNet