Lesson 3 - Convolutional Neural Networks Flashcards

Question 1

Q

What is the goal or the use of having the cross entropy block there during training?

Answer

A

It will tell us how close our prediction ŷ is to the ground truth, the one that was provided with the input. During training, it will give an indication as to how good our model is behaving at a moment during training

Question 2

Q

Why do we use the gradient information during the backward pass? What is the gradient telling us?

Answer

A

The gradient will tell which is the direction I want the parameters of the network (in this case the weights of the layers) which will minimize my loss and that is why I always follow the direction of the gradient or the negative of the gradient because I want to minimize it.

Question 3

Q

How do we know when to stop training?

Answer

A

When we see that the loss of training is increasing again
= overfitting

Question 4

Q

ReLU allows us to circumvent some issues that we had when computing the gradient. What where those issues? What was the problem with for example Sigmoid?

Answer

A

Sigmoid tended to saturate in the two extremes which had as side-effect that in those regions the gradient tended to vanish (become 0)

A gradient of 0 means that when you start to navigate the parameter space to update your model, you will not get sufficient information to do that.

Question 5

Q

What are some characteristics of visual data?

Answer

A

Locality: neighboring pixels are highly correlated
Translation Invariance: meaningful patterns can appear anywhere
Compositionality: Learning feature hierarchies

Question 6

Q

Why is it not sufficient to just flatten an image?

Answer

A

There are all sorts of translations that can happen and we don’t want the model to train for specific positions.

For example if the image shifts 6 pixels to the left, weights learned can suddenly be not good anymore

Question 7

Q

What are the differences between locally connected layers and convolutional layers?

Answer

A

Locally Connected Layers

Locally Connected Layer: In a locally connected layer, each neuron is connected to a small, local region of the input, but these connections are not shared across the spatial dimensions. This means that the weights for connections in different regions are independent.

Drawback: This lack of weight sharing leads to a significant increase in the number of parameters, which can be inefficient and prone to overfitting, especially for large input sizes.

Convolutional Layers

Weight Sharing: A convolutional layer addresses this by sharing weights across different spatial locations. A single set of weights (called a filter or kernel) is used to slide across the entire input, creating a feature map.

Efficiency: This drastically reduces the number of parameters compared to locally connected layers, making the model more efficient and less prone to overfitting.

Parameter Sharing: By using the same filter for all spatial locations, convolutional layers are able to detect the same feature (like an edge or a texture) regardless of its position in the input.

Question 8

Q

Can you explain what the response/feature map is?

Answer

A

The output or result of applying an operation in a convolutional layer.

When sliding the kernel over the input, the output of that kernel is in the response map.

Question 9

Q

What is the receptive field?

Answer

A

What receptive field refers to, is actually, given a specific neuron, what is the part of the input that that neuron perceives, that that neuron can observe, in order to get that activation, that response value.

Question 10

Q

When you say “my layer has 10 kernels”, what does that mean?

Answer

A

That means that you have ten masks sliding over your input so at least looking for ten different features

Question 11

Q

What convolution operations did we see in class?

Answer

A

Valid Convolution -> every considered point lies within the input
Full Convolution -> at least one value of the kernel covers the input
Same Convolution -> kernel evaluated (centered) at every location of the input
Strided Convolution -> sparser kernel evaluations
Dilated Convolution -> points considered in the kernel are spread

Question 12

Q

What is VALID convolution? How is the size ratio for input and output?

Answer

A

Normal convolution, every considered point lies within the input

size of input is larger than size of output

Question 13

Q

What is FULL convolution? How is the size ratio for input and output?

Answer

A

At least one value of the kernel covers the input

size of input is smaller than size of output

Question 14

Q

What is a potential issue with full convolution?

Answer

A

Also points outside the input are considered but these points are undefined.
Therefor we need padding
We cannot pad with 0 because that may influence the result

Question 15

Q

What is SAME convolution? How is the size ratio for input and output?

Answer

A

Kernel evaluated (centered) at every location of the input

Sizes are equal

Question 16

Q

Why would we use strided convolution?

Answer

Study These Flashcards

A

It is effective for decreasing the output scale

Question 17

Q

Why would we use dilated convolution?

Answer

Study These Flashcards

A

It is effective for increasing the receptive field

Question 18

Q

What is pooling?

Answer

Study These Flashcards

A

For each region that the kernel observes, compute a single value per region.
–> region size is variable

Most common: max pooling, just get maximum value for each region

Question 19

Q

What is 1x1 convolution and for what is it used?

Answer

Study These Flashcards

A

perform a neuron-level operation
integration over the channels

–> effective for modifying number of channels

Lesson 3 - Convolutional Neural Networks Flashcards

(19 cards)