CNNs Flashcards
What is the advantage of max pooling layers?
The function of max-pooling layers is reducing the size of feature maps and solve overfitting problems.
How do convolutional filters work?
convolutional filters are capable of finding features of images, i.e., we get another (smaller) matrix with “degrees of overlap” between the image and the filter kernel
What is the purpose of a convolutional filter?
A filter acts as a “feature detector” – returns high values when the corresponding patch is similar to the filter matrix
What is LeNET? Describe the architecture
– It has 1256 nodes
– 64.660 connections
– 9.760 trainable parameters (and not millions!)
– trained with the Backpropagation algorithm!
Draw the picture of LeNET architecture
What is ILSVRC
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale.
How did CNNs progress as per the ILSVRC?
ILSVRC 2010 – 28.2% error with shallow layers with 1 CNN
ILSVRC 2012 – 25.8% error with shallow layers again with 7 CNNs
in 2012, again, AlexNET used 8 layers and had 16.4% error
in 2014 VGG used 19 layers and had 7.3% error
in 2014 GoogleNET used 22 layers and had 6.7% error
Finally, in 2015 ResNET used 152 layers and had 3.57% error
At the moment, which CNN has the most Top-1 accuracy %?
As per lectures, ResNET;
As per 2021, CoAtNet-7 with 90.88% accuracy
What are additional tricks the creators of AlexNET used to improve accuracy?
• Data Augmentation: increase the number of training
records by applying some modifications: shifts,
contrasts, …
- Computations distributed over 2GPUs
- Local Contrast Normalization
- ReLU (Rectified Linear Unit) instead of sigmoid activation functions
- L2 weight normalization: punish big weights
• Dropout: when training, in every iteration, disable 50%
nodes (disabling weights doesn’t work!)
What kind of data augmentation techniques did creators of AlexNET use?
Increased the number of training records by applying some modifications: shifts, contrasts, …
What is the key idea behind Residual networks?
it’s easier to learn “the modification of the original image than the modified image”
What technique does ResNET adopt to improve accuracy and what problem does it solve?
Implementation of the key idea: add identity shortcuts between 2 (or more) layers. It uses skip connections, or shortcuts to jump over some layers and reduces the vanishing gradient problem as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.
Define overfitting
Overfitting: model learns “small details” of the training set and is unable to correctly classify cases of the test set (usually: too many parameters/degrees of freedom)
Define regularisation
preventing overfitting by imposing some constraints on values or the number of model parameters.
Define cross-validation
monitoring the error both on the training and the test set
What happens when you use |x-y| instead of (x-y)2?
The error won’t be “smooth”
Give an example of regularisation
Add to the error function an extra term: multiply “the sum of squared coefficients of your model” with lambda where lambda is a tunable parameter that controls the size “punishment” for too big values of coefficients. (Slide 70)
What is shrinkage, ridge regression, and weight decay in the context of neural networks?
Minimize training error while keeping the weights small.
Say we have polynomial degree 9. Under regular circumstances, it would overfit the data. How do you correct for it without changing the degree of the polynomial?
You introduce a regularisation term of λ=1.5230e-08 or ln λ = -18
Why does Atari network need 4 consecutive frames for training?
4 frames are needed to contain info about ball direction, speed, acceleration, etc.
The Atari network consists of 18 output nodes. What do they represent?
the output consists of 18 nodes that correspond to all possible positions of the joystick (left-right, up-down, 4 diagonals, neutral; plus “red button pressed”)
Describe the reinforcement learning technique briefly.
• Assume that the network can estimate the “quality” of possible actions
• initialize the network at random and use it to play many games =>
generate some training data
• “learn from experience” => use the generated data to improve the network
(with help of the Bellman’s equation)
• use the improved network to generate “better data” and return to the previous
step; iterate till optimum reached
What equation do you use to improve the network?
Bellman’s equation
What is Bellman’s equation?
The equation writes the “value” of a decision problem at a certain point in time in terms of the payoff from some initial choices and the “value” of the remaining decision problem that results from those initial choices.
What technique does AlphaGo zero use to get “better estimates”?
extensive use of Monte Carlo Tree Search