CNN Flashcards
What are the advantages of a CNN over a fully connected DNN?
Because consecutive layers are only partially connected and because it heavily reuses its weights, a CNN has many fewer parameters than a fully connected DNN, which makes it much faster to train, reduces the risk of overfitting, and requires much less training data.
When a CNN has learned a kernel that can detect a particular feature, it can detect that feature anywhere in the image. In contrast, when a DNN learns a feature in one location, it can detect it only in that particular location. Since images typically have very repetitive features, CNNs are able to generalize much better than DNNs for image processing tasks such as classification, using fewer training examples.
Finally, a DNN has no prior knowledge of how pixels are organized; it does not know that nearby pixels are close. A CNN’s architecture embeds this prior knowledge. Lower layers typically identify features in small areas of the images, while higher layers combine the lower-level features into larger features. This works well with most natural images, giving CNNs a decisive head start compared to DNNs.
If your GPU runs out of memory while training a CNN, what are the 5 things you could try to solve the problem?
- Reduce the mini-batch size
- reduce dimensionality using a larger stride in one or more layers
- Remove one or more layers
- use 16 bit floats instead of 32-bit floats
- Distribute the CNN across multiple devices.
Why would you want to add a max-pooling layer rather than a convolutional layer with the same stride?
A max-pooling layer has no parameters at all, whereas a convolutional layer has quite a few.
What is the pooling layer, what is it used for and what is the most common value for its strides?
A pooling layer combines multiple inputs together.
Pooling is a downsampling method that reduces the number of parameters. In addition to the function of down-sampling, pooling is used in Convolutional Neural Networks to make the detection of certain features somewhat invariant to scale and orientation changes.
The stride is generally chosen such that the multiple pooling layer do not overlap but still cover all the input.
Why is a convolution layer almost always followed by a pooling layer?
Generally, the filter size from the convolution layers stays the same. By pooling, the relative size of the filter increase with respect to the resulting image. In other words, the filter now starts to look for bigger and bigger patterns that are combinations of the previous smaller pattern.
What is the basic CNN architecture?
- Conv > Pool > Conv > Pool > …
- Flatten layer.
- Dense layers (the traditional ANN)
- Output.
Why do we use padding with the pooling operation?
It’s possible that our pooling operation will go over the edge of our matrice. To avoid this problem we add some padding with the value of zero.
Why do we need to flatten the output of the convolutions layer before feeding it to the fully connected neural network?
In a sense, the data passes through the convolution layer one 2d matrix at a time. In contrast, the fully connected layers take in the data as a 1d vector.
Therefore, the dimensions of the output of the Convolutional Neural Network need to be flattened.
What is the formula used to calculate the size of the tensor of a convolution layer? Let said you have a 39x39x3 tensor and you apply ten 3x3 filters on that tensor, what will be the shape of the resulting tensor?
The formula is: floor((n+2p-f)/s +1)
n: The height and width of the previous tensor.
p: the padding
f: filter size
s: stride
For the second part, the resulting tensor will be 37x37x10. Note the formula assumes the height and width to be equal for the tensor. It also assumes the filter is a square (PyTorch documentation has the formula if the matrix is not square). Andrew ng has a very good explanation.
What are the filters (or kernels) used for and how are they applied?
The filters are a pattern recognition operation. A given filter is applied to all the features map from the previous layer. The results are then added up and will form one of the resulting feature maps. The same process is used to compute the other feature map. Note, in the theory visualization there is a nice picture explaining the process.
Why do we use an activation function after a convolution layer?
To add non-linearity.