Convolutional Neural Networks Flashcards
statistical invariants
things that don’t change across space or time. (the image of a cat in the top or bottom of a picture, the meaning of kitten in multiple parts of a paragraph
What are convets or convolutional networks
networks that share their parameters across space
What do strides represent?
Number of pixels your patch/kernal/filter/convolution slides across the image. A stride of one creates roughly the same output as input. A stride of two creates an output roughly half the size of the input. Increases in stride values reduces network parameters but sacrifices accuracy
Valid Padding
- Valid -if your filter DOES NOT go off the edge.
*
CNN intuition. What the hell is it doing?
Learns basic shapes(horizontal/vertical lines) first, then uses this knowledge to learn more complex(non linear) shapes Uses a filter to slide across an image, multiple filter weights with each pixel to create a new image. This new image is a result of “sharing” weights and bias to address statistical invariance. This allows the CNN to learn the shapes.
Explain Filter Depth and its importance on number of neurons each patch/filter is connected to.
Its common to use more than one filter on a specific patch. The filter depth is the amount of filters used. Different filters designed for different purposes. # of filters will tell us how many neurons each patch is connected to. 3 filters, then each patch connects to 3 neurons.
Generally, filter depth increases with each layer to capture increased complexity. (think start with basic shapes, then more complex shapes)
Why connect each patch to multiple neurons?
Gives CNN ability to capture interesting characteristics of the image
Stride/Padding Question
- Valid padding doesn’t go off the edge, width - (filter size - 1). Think if you slide a 3x3 the filter across the bottom of the 28x28 image. You can only do this 26 times across and 26 times up.
- Same padding goes off edge and create same size output assuming stride is one, half if stride is 2. Just think how you would slide a 3x3 filter across a 28*28 image 28 times. Unlike the valid padding, you need to start outside the image. How far you start oustide depends on filter size. Basically just take filter - 1 and thats the padding on each side of image.
- Outputs of each stride are referred to as feature maps
How doe we reduce the dimensionality of the image?
Increased strides with each convolution and increase
Formula output height, width, depth after a convolution is applied
divide by Stride ,then add 1
How does Structure help learning
If you know something about your data, and your model doesn’ have to learn it from scratch, its going to perform better. For example classifying the letter A. If you know color doesn’t matter, then transform data into grayscale
translation invariance
objects in imags are roughly the same regardless of location
Explain image and what is the operation called
White is image
The blue piece is a filter which has weights. We apply this filter(matmul pixel values by filter weights) to a small set of pixels(each color channel). Visually, think of the output being stacked above in the purple box, the depth of the ouput being represented by the red tube and the width/height represented by the purple box.
Slide filter over, repeat operation above which continues to input a new value for the feature map.
Operation = convolution
Explain this image
After running the convolution, you create another image that has a diferent width,height and depth. These dimensions are based on the parameters of the convolution operation(filter size, padding, stride etc). Its important to note that while the original image started out with 3 color channels(assuming image is RGB), the convolution output creates K color channels. K representing the number of convolutions( or # of times the filter slides over the image).
Explain this image
The image start as shallow(only 3 color channels) and has the original widith/height. However, as you apply a filter with strides greater than 1, the output is more color channels(depth) and smaller width height. Think of this a squeezing spatial information out of the image and only parameters that map to the content remain.
What are feature maps?
The individual matrix of pixels for each color channel.
Explain stride of 1 vs 2
1 = output is roughly same size(width/height), also depends on padding parameter
2 = output is roughly half size(width/height), also depends on padding parameter