Convolutional Neural Networks Flashcards by james Bunn

statistical invariants

things that don’t change across space or time. (the image of a cat in the top or bottom of a picture, the meaning of kitten in multiple parts of a paragraph

How well did you know this?

Not at all

Perfectly

What are convets or convolutional networks

networks that share their parameters across space

How well did you know this?

Not at all

Perfectly

What do strides represent?

Number of pixels your patch/kernal/filter/convolution slides across the image. A stride of one creates roughly the same output as input. A stride of two creates an output roughly half the size of the input. Increases in stride values reduces network parameters but sacrifices accuracy

How well did you know this?

Not at all

Perfectly

Valid Padding

Valid -if your filter DOES NOT go off the edge.
*

How well did you know this?

Not at all

Perfectly

CNN intuition. What the hell is it doing?

Learns basic shapes(horizontal/vertical lines) first, then uses this knowledge to learn more complex(non linear) shapes Uses a filter to slide across an image, multiple filter weights with each pixel to create a new image. This new image is a result of “sharing” weights and bias to address statistical invariance. This allows the CNN to learn the shapes.

How well did you know this?

Not at all

Perfectly

Explain Filter Depth and its importance on number of neurons each patch/filter is connected to.

Its common to use more than one filter on a specific patch. The filter depth is the amount of filters used. Different filters designed for different purposes. # of filters will tell us how many neurons each patch is connected to. 3 filters, then each patch connects to 3 neurons.

Generally, filter depth increases with each layer to capture increased complexity. (think start with basic shapes, then more complex shapes)

How well did you know this?

Not at all

Perfectly

Why connect each patch to multiple neurons?

Gives CNN ability to capture interesting characteristics of the image

How well did you know this?

Not at all

Perfectly

Stride/Padding Question

Valid padding doesn’t go off the edge, width - (filter size - 1). Think if you slide a 3x3 the filter across the bottom of the 28x28 image. You can only do this 26 times across and 26 times up.
Same padding goes off edge and create same size output assuming stride is one, half if stride is 2. Just think how you would slide a 3x3 filter across a 28*28 image 28 times. Unlike the valid padding, you need to start outside the image. How far you start oustide depends on filter size. Basically just take filter - 1 and thats the padding on each side of image.
Outputs of each stride are referred to as feature maps

How well did you know this?

Not at all

Perfectly

How doe we reduce the dimensionality of the image?

Increased strides with each convolution and increase

How well did you know this?

Not at all

Perfectly

Formula output height, width, depth after a convolution is applied

divide by Stride ,then add 1

How well did you know this?

Not at all

Perfectly

How does Structure help learning

If you know something about your data, and your model doesn’ have to learn it from scratch, its going to perform better. For example classifying the letter A. If you know color doesn’t matter, then transform data into grayscale

How well did you know this?

Not at all

Perfectly

translation invariance

objects in imags are roughly the same regardless of location

How well did you know this?

Not at all

Perfectly

Explain image and what is the operation called

White is image

The blue piece is a filter which has weights. We apply this filter(matmul pixel values by filter weights) to a small set of pixels(each color channel). Visually, think of the output being stacked above in the purple box, the depth of the ouput being represented by the red tube and the width/height represented by the purple box.

Slide filter over, repeat operation above which continues to input a new value for the feature map.

Operation = convolution

How well did you know this?

Not at all

Perfectly

Explain this image

After running the convolution, you create another image that has a diferent width,height and depth. These dimensions are based on the parameters of the convolution operation(filter size, padding, stride etc). Its important to note that while the original image started out with 3 color channels(assuming image is RGB), the convolution output creates K color channels. K representing the number of convolutions( or # of times the filter slides over the image).

How well did you know this?

Not at all

Perfectly

Explain this image

The image start as shallow(only 3 color channels) and has the original widith/height. However, as you apply a filter with strides greater than 1, the output is more color channels(depth) and smaller width height. Think of this a squeezing spatial information out of the image and only parameters that map to the content remain.

How well did you know this?

Not at all

Perfectly

What are feature maps?

The individual matrix of pixels for each color channel.

How well did you know this?

Not at all

Perfectly

Explain stride of 1 vs 2

1 = output is roughly same size(width/height), also depends on padding parameter

2 = output is roughly half size(width/height), also depends on padding parameter

How well did you know this?

Not at all

Perfectly

Same Padding

Study These Flashcards

Same - goes off edge and pad with zeros to ensure output map size is SAME as input map size. Think “s” sails off.

Padd with zeros in such a way that the output map is the same size as input map

Whats a really important concept about filters and the pixels they assess

Study These Flashcards

What’s important here is that we are grouping together adjacent pixels and treating them as a collective.

In a normal, non-convolutional neural network, we would have ignored this adjacency. In a normal network, we would have connected every pixel in the input image to a neuron in the next layer. In doing so, we would not have taken advantage of the fact that pixels in an image are close together for a reason and have special meaning.

By taking advantage of this local structure, our CNN learns to classify local patterns, like shapes and objects, in an image.

Why connect a single patch to multiple neurons?

Study These Flashcards

Multiple neurons can be useful because a patch can have multiple interesting characteristics that we want to capture.

For example, one patch might include some white teeth, some blonde whiskers, and part of a red tongue. In that case, we might want a filter depth of at least three - one for each of teeth, whiskers, and tongue.

Having multiple neurons for a given patch ensures that our CNN can learn to capture whatever characteristics the CNN learns are important.

Are weights shared between filters?

Study These Flashcards

As we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren’t shared across the output channels.

How is padding calculated in TF

Study These Flashcards

How to setup this in TF

Study These Flashcards

How many parameters?

Study These Flashcards

Remember, you build every dimension of the output by matmul the weight matrix to the patch of pixels, slide/repeat.

With parameter sharing, how many parameters?

Explain Max Pooling

The image is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value. For example, [[1, 0], [4, 6]] becomes 6, because 6 is the maximum value in this set. Similarly, [[2, 3], [6, 8]]becomes 8.

Advantages of Max Pooling

Implement Max Pooling in TF

Why has max pooling fallen out of favor

Is pooling applied for each depth slice? How is output depth determined?

For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.

1x1 convolution

turns linear model into non-linear, uses matrix multiplies and few parameters

Inception Model

1. composition of average pooling followed by 1x1 2. 1x1 convolution followed by a 3x3 3. 1x1 convolution followed by a 5x5 4. Concatenate the output of each of them 5. Performs better than simple convolution

how to specify weights/biases for convolutional net in TF

Truncated normal creates values from a truncated normal distribution for all values of the 3d space. width,height,input\_depth,output\_depth

Explain this chart

They still compute a dot product of their weights with the input followed by a non-linearity, but their connectivity is now restricted to be local spatially. The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11F=11, stride S=4S=4 and no zero padding P=0P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96]. Each of the 55\*55\*96 neurons in this volume was connected to a region of size [11x11x3] in the input volume. Moreover, all 96 neurons in each depth column are connected to the same [11x11x3] region of the input, but of course with different weights.

What hyperparameters control the depth of the output volume

depth, stride and zero-padding depth = number of filters we want to use stride = how many pixels we slide over before applying filter again zero-padding - do we want to keep the output size the same?

What is the constraint with strides when combing with other hyperparameters such as filter size and padding.

The output dimensions need to be integers. If when calculating the output dimensions, and an integer is not calcluated for the height/width(4.5), then either zero padding, different strides or something else could be used.

Explain this chart

Shows 2 filters(set of weights) being applied to a 3d image. The depth of each filter is determined by the input image. The number of filters determines the number of output slices. Each value in the output slice(green) is a neuron which is associated to a specific input region and one of the weights.

Max Pooling in TF

What does it mean to have a fully connected layer

You take lets say a 3d image 5X5X16 and flatten so that you have 400 neurons in a single column vector. You effectively strip the spatial connectivity. The output is the amount of classes you are trying to classify.

Convolutional Neural Networks Flashcards

(43 cards)