CNNs Flashcards
What is the advantage of max pooling layers?
The function of max-pooling layers is reducing the size of feature maps and solve overfitting problems.
How do convolutional filters work?
convolutional filters are capable of finding features of images, i.e., we get another (smaller) matrix with “degrees of overlap” between the image and the filter kernel
What is the purpose of a convolutional filter?
A filter acts as a “feature detector” – returns high values when the corresponding patch is similar to the filter matrix
What is LeNET? Describe the architecture
– It has 1256 nodes
– 64.660 connections
– 9.760 trainable parameters (and not millions!)
– trained with the Backpropagation algorithm!
Draw the picture of LeNET architecture
What is ILSVRC
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale.
How did CNNs progress as per the ILSVRC?
ILSVRC 2010 – 28.2% error with shallow layers with 1 CNN
ILSVRC 2012 – 25.8% error with shallow layers again with 7 CNNs
in 2012, again, AlexNET used 8 layers and had 16.4% error
in 2014 VGG used 19 layers and had 7.3% error
in 2014 GoogleNET used 22 layers and had 6.7% error
Finally, in 2015 ResNET used 152 layers and had 3.57% error
At the moment, which CNN has the most Top-1 accuracy %?
As per lectures, ResNET;
As per 2021, CoAtNet-7 with 90.88% accuracy
What are additional tricks the creators of AlexNET used to improve accuracy?
• Data Augmentation: increase the number of training
records by applying some modifications: shifts,
contrasts, …
- Computations distributed over 2GPUs
- Local Contrast Normalization
- ReLU (Rectified Linear Unit) instead of sigmoid activation functions
- L2 weight normalization: punish big weights
• Dropout: when training, in every iteration, disable 50%
nodes (disabling weights doesn’t work!)
What kind of data augmentation techniques did creators of AlexNET use?
Increased the number of training records by applying some modifications: shifts, contrasts, …
What is the key idea behind Residual networks?
it’s easier to learn “the modification of the original image than the modified image”
What technique does ResNET adopt to improve accuracy and what problem does it solve?
Implementation of the key idea: add identity shortcuts between 2 (or more) layers. It uses skip connections, or shortcuts to jump over some layers and reduces the vanishing gradient problem as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.
Define overfitting
Overfitting: model learns “small details” of the training set and is unable to correctly classify cases of the test set (usually: too many parameters/degrees of freedom)
Define regularisation
preventing overfitting by imposing some constraints on values or the number of model parameters.
Define cross-validation
monitoring the error both on the training and the test set