CNNs Flashcards

Question 1

Q

What is the advantage of max pooling layers?

Answer

A

The function of max-pooling layers is reducing the size of feature maps and solve overfitting problems.

Question 2

Q

How do convolutional filters work?

Answer

A

convolutional filters are capable of finding features of images, i.e., we get another (smaller) matrix with “degrees of overlap” between the image and the filter kernel

Question 3

Q

What is the purpose of a convolutional filter?

Answer

A

A filter acts as a “feature detector” – returns high values when the corresponding patch is similar to the filter matrix

Question 4

Q

What is LeNET? Describe the architecture

Answer

A

– It has 1256 nodes
– 64.660 connections
– 9.760 trainable parameters (and not millions!)
– trained with the Backpropagation algorithm!

Question 5

Q

Draw the picture of LeNET architecture

Question 6

Q

What is ILSVRC

Answer

A

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale.

Question 7

Q

How did CNNs progress as per the ILSVRC?

Answer

A

ILSVRC 2010 – 28.2% error with shallow layers with 1 CNN

ILSVRC 2012 – 25.8% error with shallow layers again with 7 CNNs

in 2012, again, AlexNET used 8 layers and had 16.4% error

in 2014 VGG used 19 layers and had 7.3% error

in 2014 GoogleNET used 22 layers and had 6.7% error

Finally, in 2015 ResNET used 152 layers and had 3.57% error

Question 8

Q

At the moment, which CNN has the most Top-1 accuracy %?

Answer

A

As per lectures, ResNET;

As per 2021, CoAtNet-7 with 90.88% accuracy

Question 9

Q

What are additional tricks the creators of AlexNET used to improve accuracy?

Answer

A

• Data Augmentation: increase the number of training
records by applying some modifications: shifts,
contrasts, …

Computations distributed over 2GPUs
Local Contrast Normalization
ReLU (Rectified Linear Unit) instead of sigmoid activation functions
L2 weight normalization: punish big weights

• Dropout: when training, in every iteration, disable 50%
nodes (disabling weights doesn’t work!)

Question 10

Q

What kind of data augmentation techniques did creators of AlexNET use?

Answer

A

Increased the number of training records by applying some modifications: shifts, contrasts, …

Question 11

Q

What is the key idea behind Residual networks?

Answer

A

it’s easier to learn “the modification of the original image than the modified image”

Question 12

Q

What technique does ResNET adopt to improve accuracy and what problem does it solve?

Answer

A

Implementation of the key idea: add identity shortcuts between 2 (or more) layers. It uses skip connections, or shortcuts to jump over some layers and reduces the vanishing gradient problem as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.

Question 13

Q

Define overfitting

Answer

A

Overfitting: model learns “small details” of the training set and is unable to correctly classify cases of the test set (usually: too many parameters/degrees of freedom)

Question 14

Q

Define regularisation

Answer

A

preventing overfitting by imposing some constraints on values or the number of model parameters.

Question 15

Q

Define cross-validation

Answer

A

monitoring the error both on the training and the test set

Question 16

Q

What happens when you use |x-y| instead of (x-y)2?

Answer

A

The error won’t be “smooth”

Question 17

Q

Give an example of regularisation

Answer

A

Add to the error function an extra term: multiply “the sum of squared coefficients of your model” with lambda where lambda is a tunable parameter that controls the size “punishment” for too big values of coefficients. (Slide 70)

Question 18

Q

What is shrinkage, ridge regression, and weight decay in the context of neural networks?

Answer

A

Minimize training error while keeping the weights small.

Question 19

Q

Say we have polynomial degree 9. Under regular circumstances, it would overfit the data. How do you correct for it without changing the degree of the polynomial?

Answer

A

You introduce a regularisation term of λ=1.5230e-08 or ln λ = -18

Question 20

Q

Why does Atari network need 4 consecutive frames for training?

Answer

A

4 frames are needed to contain info about ball direction, speed, acceleration, etc.

Question 21

Q

The Atari network consists of 18 output nodes. What do they represent?

Answer

A

the output consists of 18 nodes that correspond to all possible positions of the joystick (left-right, up-down, 4 diagonals, neutral; plus “red button pressed”)

Question 22

Q

Describe the reinforcement learning technique briefly.

Answer

A

• Assume that the network can estimate the “quality” of possible actions

• initialize the network at random and use it to play many games =>
generate some training data

• “learn from experience” => use the generated data to improve the network
(with help of the Bellman’s equation)

• use the improved network to generate “better data” and return to the previous
step; iterate till optimum reached

Question 23

Q

What equation do you use to improve the network?

Answer

A

Bellman’s equation

Question 24

Q

What is Bellman’s equation?

Answer

A

The equation writes the “value” of a decision problem at a certain point in time in terms of the payoff from some initial choices and the “value” of the remaining decision problem that results from those initial choices.

Question 25

Q

What technique does AlphaGo zero use to get “better estimates”?

Answer

A

extensive use of Monte Carlo Tree Search