CNNs Flashcards

1
Q

What is the advantage of max pooling layers?

A

The function of max-pooling layers is reducing the size of feature maps and solve overfitting problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do convolutional filters work?

A

convolutional filters are capable of finding features of images, i.e., we get another (smaller) matrix with “degrees of overlap” between the image and the filter kernel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of a convolutional filter?

A

A filter acts as a “feature detector” – returns high values when the corresponding patch is similar to the filter matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is LeNET? Describe the architecture

A

– It has 1256 nodes
– 64.660 connections
– 9.760 trainable parameters (and not millions!)
– trained with the Backpropagation algorithm!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Draw the picture of LeNET architecture

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ILSVRC

A

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) evaluates algorithms for object detection and image classification at large scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How did CNNs progress as per the ILSVRC?

A

ILSVRC 2010 – 28.2% error with shallow layers with 1 CNN

ILSVRC 2012 – 25.8% error with shallow layers again with 7 CNNs

in 2012, again, AlexNET used 8 layers and had 16.4% error

in 2014 VGG used 19 layers and had 7.3% error

in 2014 GoogleNET used 22 layers and had 6.7% error

Finally, in 2015 ResNET used 152 layers and had 3.57% error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

At the moment, which CNN has the most Top-1 accuracy %?

A

As per lectures, ResNET;

As per 2021, CoAtNet-7 with 90.88% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are additional tricks the creators of AlexNET used to improve accuracy?

A

• Data Augmentation: increase the number of training
records by applying some modifications: shifts,
contrasts, …

  • Computations distributed over 2GPUs
  • Local Contrast Normalization
  • ReLU (Rectified Linear Unit) instead of sigmoid activation functions
  • L2 weight normalization: punish big weights

• Dropout: when training, in every iteration, disable 50%
nodes (disabling weights doesn’t work!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of data augmentation techniques did creators of AlexNET use?

A

Increased the number of training records by applying some modifications: shifts, contrasts, …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the key idea behind Residual networks?

A

it’s easier to learn “the modification of the original image than the modified image”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What technique does ResNET adopt to improve accuracy and what problem does it solve?

A

Implementation of the key idea: add identity shortcuts between 2 (or more) layers. It uses skip connections, or shortcuts to jump over some layers and reduces the vanishing gradient problem as there are fewer layers to propagate through. The network then gradually restores the skipped layers as it learns the feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define overfitting

A

Overfitting: model learns “small details” of the training set and is unable to correctly classify cases of the test set (usually: too many parameters/degrees of freedom)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define regularisation

A

preventing overfitting by imposing some constraints on values or the number of model parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define cross-validation

A

monitoring the error both on the training and the test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens when you use |x-y| instead of (x-y)2?

A

The error won’t be “smooth”

17
Q

Give an example of regularisation

A

Add to the error function an extra term: multiply “the sum of squared coefficients of your model” with lambda where lambda is a tunable parameter that controls the size “punishment” for too big values of coefficients. (Slide 70)

18
Q

What is shrinkage, ridge regression, and weight decay in the context of neural networks?

A

Minimize training error while keeping the weights small.

19
Q

Say we have polynomial degree 9. Under regular circumstances, it would overfit the data. How do you correct for it without changing the degree of the polynomial?

A

You introduce a regularisation term of λ=1.5230e-08 or ln λ = -18

20
Q

Why does Atari network need 4 consecutive frames for training?

A

4 frames are needed to contain info about ball direction, speed, acceleration, etc.

21
Q

The Atari network consists of 18 output nodes. What do they represent?

A

the output consists of 18 nodes that correspond to all possible positions of the joystick (left-right, up-down, 4 diagonals, neutral; plus “red button pressed”)

22
Q

Describe the reinforcement learning technique briefly.

A

• Assume that the network can estimate the “quality” of possible actions

• initialize the network at random and use it to play many games =>
generate some training data

• “learn from experience” => use the generated data to improve the network
(with help of the Bellman’s equation)

• use the improved network to generate “better data” and return to the previous
step; iterate till optimum reached

23
Q

What equation do you use to improve the network?

A

Bellman’s equation

24
Q

What is Bellman’s equation?

A

The equation writes the “value” of a decision problem at a certain point in time in terms of the payoff from some initial choices and the “value” of the remaining decision problem that results from those initial choices.

25
Q

What technique does AlphaGo zero use to get “better estimates”?

A

extensive use of Monte Carlo Tree Search