Lesson 3 - Convolutional Neural Networks Flashcards

1
Q

What is the goal or the use of having the cross entropy block there during training?

A

It will tell us how close our prediction ŷ is to the ground truth, the one that was provided with the input. During training, it will give an indication as to how good our model is behaving at a moment during training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we use the gradient information during the backward pass? What is the gradient telling us?

A

The gradient will tell which is the direction I want the parameters of the network (in this case the weights of the layers) which will minimize my loss and that is why I always follow the direction of the gradient or the negative of the gradient because I want to minimize it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we know when to stop training?

A

When we see that the loss of training is increasing again
= overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ReLU allows us to circumvent some issues that we had when computing the gradient. What where those issues? What was the problem with for example Sigmoid?

A

Sigmoid tended to saturate in the two extremes which had as side-effect that in those regions the gradient tended to vanish (become 0)

A gradient of 0 means that when you start to navigate the parameter space to update your model, you will not get sufficient information to do that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some characteristics of visual data?

A
  • Locality: neighboring pixels are highly correlated
  • Translation Invariance: meaningful patterns can appear anywhere
  • Compositionality: Learning feature hierarchies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it not sufficient to just flatten an image?

A

There are all sorts of translations that can happen and we don’t want the model to train for specific positions.

For example if the image shifts 6 pixels to the left, weights learned can suddenly be not good anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the differences between locally connected layers and convolutional layers?

A

Locally Connected Layers​

Locally Connected Layer: In a locally connected layer, each neuron is connected to a small, local region of the input, but these connections are not shared across the spatial dimensions. This means that the weights for connections in different regions are independent.​

Drawback: This lack of weight sharing leads to a significant increase in the number of parameters, which can be inefficient and prone to overfitting, especially for large input sizes.​

Convolutional Layers​

Weight Sharing: A convolutional layer addresses this by sharing weights across different spatial locations. A single set of weights (called a filter or kernel) is used to slide across the entire input, creating a feature map.​

Efficiency: This drastically reduces the number of parameters compared to locally connected layers, making the model more efficient and less prone to overfitting.​

Parameter Sharing: By using the same filter for all spatial locations, convolutional layers are able to detect the same feature (like an edge or a texture) regardless of its position in the input.​

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you explain what the response/feature map is?

A

The output or result of applying an operation in a convolutional layer.

When sliding the kernel over the input, the output of that kernel is in the response map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the receptive field?

A

What receptive field refers to, is actually, given a specific neuron, what is the part of the input that that neuron perceives, that that neuron can observe, in order to get that activation, that response value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When you say “my layer has 10 kernels”, what does that mean?

A

That means that you have ten masks sliding over your input so at least looking for ten different features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What convolution operations did we see in class?

A
  • Valid Convolution -> every considered point lies within the input
  • Full Convolution -> at least one value of the kernel covers the input
  • Same Convolution -> kernel evaluated (centered) at every location of the input
  • Strided Convolution -> sparser kernel evaluations
  • Dilated Convolution -> points considered in the kernel are spread
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is VALID convolution? How is the size ratio for input and output?

A

Normal convolution, every considered point lies within the input

size of input is larger than size of output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is FULL convolution? How is the size ratio for input and output?

A

At least one value of the kernel covers the input

size of input is smaller than size of output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a potential issue with full convolution?

A

Also points outside the input are considered but these points are undefined.
Therefor we need padding
We cannot pad with 0 because that may influence the result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SAME convolution? How is the size ratio for input and output?

A

Kernel evaluated (centered) at every location of the input

Sizes are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why would we use strided convolution?

A

It is effective for decreasing the output scale

17
Q

Why would we use dilated convolution?

A

It is effective for increasing the receptive field

18
Q

What is pooling?

A

For each region that the kernel observes, compute a single value per region.
–> region size is variable

Most common: max pooling, just get maximum value for each region

19
Q

What is 1x1 convolution and for what is it used?

A
  • perform a neuron-level operation
  • integration over the channels

–> effective for modifying number of channels