Conv Net Flashcards

Question 1

Q

How do Convolutional Neural Networks work?

Answer

A

By picking up (identifying) features in an imagine, sound etc to place it in a classification (give it a label)

Question 2

Q

How does a Conv Net predict a label for an image?

E.g. determine this is a picture of a cat

Answer

A

You feed (train) the CNN pictures of cats (input)
Until the CNN has identified all the relevant features related to a ‘cat’
Now when you put in new pictures of cats (test set) it should be able to predict they’re cats based on the ‘features’ the CNN has stored/learned.

I.e. If an image has all the features derived from a cat imagine, then it must be a cat

Question 3

Q

How do Conv Nets process pictures?

Answer

A

Each colour in an image is given a number.

Those pixels are the numerical representations of images

E.g. Black and white images (2d array)
0 = Black, 255 = White, Nr in between = Grey scale

Question 4

Q

How do coloured pictures work in regards to pixels and Conv nets?

Answer

A

Coloured pictures require a 3D array

RGB

Red … each red pixel is from 0-255
Green… each green pixel is from 0-255
Blue

Question 5

Q

What are the steps for Conv Nets?

Answer

A

Step 1: Convolution
Step 2: Max Pooling
Step 3: Flattening
Step 4: Full Connection

Question 6

Q

How are features (from an image, sound, etc) identified during convolution?

Answer

A

Via a ‘feature detector’
Aka Kernel or Filter

Most of the time 3x3 … sometimes other size e.g 7x7

Question 7

Q

What is the output of a convolution?

How is it formed?

Answer

A

A Feature Map!

Aka activation/convolved map

Via:
1. Element wise multiplication (multiply each square in the Kernel by each respective square in the image that it’s on top of)…

Then add the results (of the element wise multiplication e.g 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 0… notice there’s 9 cause most kernels are 3x3) together to make up 1 square of the feature map (e.g. 3)

Then Kernel moves (‘strides’ 1 square left/down whatevs) and element wise multiplication happens again

Question 8

Q

What is the point of a feature map?

Why’s it important?

Answer

A

To create a smaller version of the original (input) image

Thus, if you switch from striding 1 square at a time, to striding 2 squares at a time…
The output (feature map) becomes even smaller

If it’s smaller, it’s quicker to process!

Question 9

Q

How do we know the feature detectors picked up a feature from our image?

Answer

A

We’ll have a square on the feature map with a nr that’s the result of both the kernel and image completely adding up (matching).

Aka if the kernel and part of an image match up… then the highest nr will be calculated:

E.g. brown eyes feature detector might be
0 1 0
0 0 0
1 0 1
If the image has a brown eye… when the kernel is on it they’ll add up and the respective square on the feature map will have ‘3’. The highest nr possible for that specific kernel

Question 10

Q

How do we prevent the loss of information if a kernel reduces the size of our original input?

Answer

A

We use multiple feature maps (kernels) and create multiple feature maps (for our 1 image/sound)

This preserves the spatial relationships between features

The CNN determines which features are important through continuous training

Question 11

Q

What’s the point of applying a rectifier function?

I.e. Rectified linear unit: ReLU layer

Answer

A

To increase non-linearity in our Conv net

Because images are non-linear…
ie transition between features in images aren’t smooth and blended, there are harsh lines and borders etc.

Aka to break up the gradual progression of shading that can occur when only a filter/kernel’s used

Question 12

Q

What is spatial invariance…

How does it relate to max pooling?

And what’s the point of it all?

Answer

A

Means the kernel can pick up a feature even if it is:
Distorted, squished, a different texture, facing a different direction etc.

Max pooling enables the above

Gives our neural network flexibility

Question 13

Q

What are the steps Max Pooling work?

Answer

A

You create a Pooled Feature Map!

Kinda like creating a feature map…
You have a 2x2 empty filter that you run over the feature map!

As the max pooling filter strides over the feature map:
The highest nr (max) on the feature map that’s currently under the 2x2 filter gets extracted and put onto 1 (respective) square of the pooled feature map

E.g. feature map:
1 1 0 0
0 0 1 4
First square in the pool feature map will have: 1
Second square in pool feature map will have: 1
Third square in pool feature map will have: 4
114

Question 14

Q

What are the benefits of max pooling?

Answer

A

We’re only extracting the most important features

We reduce size again (get rid of unimportant information) …
This ensures ML model doesn’t overfit to data

If the model doesn’t overfit it has flexibility!

Question 15

Q

Is there only max pooling?

Answer

A

No, there’s…
Mean pooling
Sum pooling
Etc

Question 16

Q

What is flattening?

Why…

Answer

Study These Flashcards

A

Turning our Max pooling feature maps (aka matrices) and turning them into a single columned vector

This vector will be our input vector (layer) for our CNN

Question 17

Q

What do hidden neurons actually do?

Answer

Study These Flashcards

A

They look for features…

If a hidden neuron is certain it’s detected a feature, it tunes to 1 (1 is for example sake)

If a hidden neuron doesn’t spot a feature it tunes to 0

If a neuron is kinda certain or uncertain the nr it tunes to can vary between 1-0. E.g 0.2

Question 18

Q

How do output neurons work?

Answer

Study These Flashcards

A

Say we have to possible outputs (labels) : Cat or Dog

If the last hidden neurons relating to dog features light up (show high probability nr. 1):

Then when the signals arrive at the Dog output neuron, it will know it needs to activate.

The cat neuron will also receive these signals and should conclude that it ‘shouldn’t’ activate; because the features aren’t significant to it

Thus, the last hidden neuron layers are casting votes on what label they think it is based on what (features) they’re seeing…
And the output neurons are taking in all of the votes, creating a bigger picture with them, taking them into consideration and determining the final verdict: e.g Dog!

Conv Net Flashcards

(18 cards)