Conv Net Flashcards
How do Convolutional Neural Networks work?
By picking up (identifying) features in an imagine, sound etc to place it in a classification (give it a label)
How does a Conv Net predict a label for an image?
E.g. determine this is a picture of a cat
- You feed (train) the CNN pictures of cats (input)
- Until the CNN has identified all the relevant features related to a ‘cat’
- Now when you put in new pictures of cats (test set) it should be able to predict they’re cats based on the ‘features’ the CNN has stored/learned.
I.e. If an image has all the features derived from a cat imagine, then it must be a cat
How do Conv Nets process pictures?
Each colour in an image is given a number.
Those pixels are the numerical representations of images
E.g. Black and white images (2d array)
0 = Black, 255 = White, Nr in between = Grey scale
How do coloured pictures work in regards to pixels and Conv nets?
Coloured pictures require a 3D array
RGB
- Red … each red pixel is from 0-255
- Green… each green pixel is from 0-255
- Blue
What are the steps for Conv Nets?
Step 1: Convolution
Step 2: Max Pooling
Step 3: Flattening
Step 4: Full Connection
How are features (from an image, sound, etc) identified during convolution?
Via a ‘feature detector’
Aka Kernel or Filter
Most of the time 3x3 … sometimes other size e.g 7x7
What is the output of a convolution?
How is it formed?
A Feature Map!
Aka activation/convolved map
Via:
1. Element wise multiplication (multiply each square in the Kernel by each respective square in the image that it’s on top of)…
- Then add the results (of the element wise multiplication e.g 0 + 1 + 1 + 0 + 0 + 0 + 0 + 1 + 0… notice there’s 9 cause most kernels are 3x3) together to make up 1 square of the feature map (e.g. 3)
Then Kernel moves (‘strides’ 1 square left/down whatevs) and element wise multiplication happens again
What is the point of a feature map?
Why’s it important?
To create a smaller version of the original (input) image
Thus, if you switch from striding 1 square at a time, to striding 2 squares at a time… The output (feature map) becomes even smaller
If it’s smaller, it’s quicker to process!
How do we know the feature detectors picked up a feature from our image?
We’ll have a square on the feature map with a nr that’s the result of both the kernel and image completely adding up (matching).
Aka if the kernel and part of an image match up… then the highest nr will be calculated:
E.g. brown eyes feature detector might be
0 1 0
0 0 0
1 0 1
If the image has a brown eye… when the kernel is on it they’ll add up and the respective square on the feature map will have ‘3’. The highest nr possible for that specific kernel
How do we prevent the loss of information if a kernel reduces the size of our original input?
We use multiple feature maps (kernels) and create multiple feature maps (for our 1 image/sound)
This preserves the spatial relationships between features
The CNN determines which features are important through continuous training
What’s the point of applying a rectifier function?
I.e. Rectified linear unit: ReLU layer
To increase non-linearity in our Conv net
Because images are non-linear…
ie transition between features in images aren’t smooth and blended, there are harsh lines and borders etc.
Aka to break up the gradual progression of shading that can occur when only a filter/kernel’s used
What is spatial invariance…
How does it relate to max pooling?
And what’s the point of it all?
Means the kernel can pick up a feature even if it is:
Distorted, squished, a different texture, facing a different direction etc.
Max pooling enables the above
Gives our neural network flexibility
What are the steps Max Pooling work?
You create a Pooled Feature Map!
Kinda like creating a feature map…
You have a 2x2 empty filter that you run over the feature map!
As the max pooling filter strides over the feature map:
The highest nr (max) on the feature map that’s currently under the 2x2 filter gets extracted and put onto 1 (respective) square of the pooled feature map
E.g. feature map:
1 1 0 0
0 0 1 4
First square in the pool feature map will have: 1
Second square in pool feature map will have: 1
Third square in pool feature map will have: 4
114
What are the benefits of max pooling?
We’re only extracting the most important features
We reduce size again (get rid of unimportant information) …
This ensures ML model doesn’t overfit to data
If the model doesn’t overfit it has flexibility!
Is there only max pooling?
No, there’s…
Mean pooling
Sum pooling
Etc