Convolutional Neural Networks Flashcards

1
Q

statistical invariants

A

things that don’t change across space or time. (the image of a cat in the top or bottom of a picture, the meaning of kitten in multiple parts of a paragraph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are convets or convolutional networks

A

networks that share their parameters across space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do strides represent?

A

Number of pixels your patch/kernal/filter/convolution slides across the image. A stride of one creates roughly the same output as input. A stride of two creates an output roughly half the size of the input. Increases in stride values reduces network parameters but sacrifices accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Valid Padding

A
  • Valid -if your filter DOES NOT go off the edge.
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CNN intuition. What the hell is it doing?

A

Learns basic shapes(horizontal/vertical lines) first, then uses this knowledge to learn more complex(non linear) shapes Uses a filter to slide across an image, multiple filter weights with each pixel to create a new image. This new image is a result of “sharing” weights and bias to address statistical invariance. This allows the CNN to learn the shapes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Filter Depth and its importance on number of neurons each patch/filter is connected to.

A

Its common to use more than one filter on a specific patch. The filter depth is the amount of filters used. Different filters designed for different purposes. # of filters will tell us how many neurons each patch is connected to. 3 filters, then each patch connects to 3 neurons.

Generally, filter depth increases with each layer to capture increased complexity. (think start with basic shapes, then more complex shapes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why connect each patch to multiple neurons?

A

Gives CNN ability to capture interesting characteristics of the image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stride/Padding Question

A
  • Valid padding doesn’t go off the edge, width - (filter size - 1). Think if you slide a 3x3 the filter across the bottom of the 28x28 image. You can only do this 26 times across and 26 times up.
  • Same padding goes off edge and create same size output assuming stride is one, half if stride is 2. Just think how you would slide a 3x3 filter across a 28*28 image 28 times. Unlike the valid padding, you need to start outside the image. How far you start oustide depends on filter size. Basically just take filter - 1 and thats the padding on each side of image.
  • Outputs of each stride are referred to as feature maps
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How doe we reduce the dimensionality of the image?

A

Increased strides with each convolution and increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formula output height, width, depth after a convolution is applied

A

divide by Stride ,then add 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does Structure help learning

A

If you know something about your data, and your model doesn’ have to learn it from scratch, its going to perform better. For example classifying the letter A. If you know color doesn’t matter, then transform data into grayscale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

translation invariance

A

objects in imags are roughly the same regardless of location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain image and what is the operation called

A

White is image

The blue piece is a filter which has weights. We apply this filter(matmul pixel values by filter weights) to a small set of pixels(each color channel). Visually, think of the output being stacked above in the purple box, the depth of the ouput being represented by the red tube and the width/height represented by the purple box.

Slide filter over, repeat operation above which continues to input a new value for the feature map.

Operation = convolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain this image

A

After running the convolution, you create another image that has a diferent width,height and depth. These dimensions are based on the parameters of the convolution operation(filter size, padding, stride etc). Its important to note that while the original image started out with 3 color channels(assuming image is RGB), the convolution output creates K color channels. K representing the number of convolutions( or # of times the filter slides over the image).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain this image

A

The image start as shallow(only 3 color channels) and has the original widith/height. However, as you apply a filter with strides greater than 1, the output is more color channels(depth) and smaller width height. Think of this a squeezing spatial information out of the image and only parameters that map to the content remain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are feature maps?

A

The individual matrix of pixels for each color channel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Explain stride of 1 vs 2

A

1 = output is roughly same size(width/height), also depends on padding parameter

2 = output is roughly half size(width/height), also depends on padding parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Same Padding

A

Same - goes off edge and pad with zeros to ensure output map size is SAME as input map size. Think “s” sails off.

Padd with zeros in such a way that the output map is the same size as input map

19
Q

Whats a really important concept about filters and the pixels they assess

A

What’s important here is that we are grouping together adjacent pixels and treating them as a collective.

In a normal, non-convolutional neural network, we would have ignored this adjacency. In a normal network, we would have connected every pixel in the input image to a neuron in the next layer. In doing so, we would not have taken advantage of the fact that pixels in an image are close together for a reason and have special meaning.

By taking advantage of this local structure, our CNN learns to classify local patterns, like shapes and objects, in an image.

20
Q

Why connect a single patch to multiple neurons?

A

Multiple neurons can be useful because a patch can have multiple interesting characteristics that we want to capture.

For example, one patch might include some white teeth, some blonde whiskers, and part of a red tongue. In that case, we might want a filter depth of at least three - one for each of teeth, whiskers, and tongue.

Having multiple neurons for a given patch ensures that our CNN can learn to capture whatever characteristics the CNN learns are important.

21
Q

Are weights shared between filters?

A

As we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren’t shared across the output channels.

22
Q

How is padding calculated in TF

A
23
Q

How to setup this in TF

A
24
Q

How many parameters?

A

Remember, you build every dimension of the output by matmul the weight matrix to the patch of pixels, slide/repeat.

25
Q
A
26
Q

With parameter sharing, how many parameters?

A
27
Q

Explain Max Pooling

A

The image is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value.

For example, [[1, 0], [4, 6]] becomes 6, because 6 is the maximum value in this set. Similarly, [[2, 3], [6, 8]]becomes 8.

28
Q

Advantages of Max Pooling

A
29
Q

Implement Max Pooling in TF

A
30
Q
A
31
Q

Why has max pooling fallen out of favor

A
32
Q

Is pooling applied for each depth slice? How is output depth determined?

A

For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.

33
Q

1x1 convolution

A

turns linear model into non-linear, uses matrix multiplies and few parameters

34
Q

Inception Model

A
  1. composition of average pooling followed by 1x1
  2. 1x1 convolution followed by a 3x3
  3. 1x1 convolution followed by a 5x5
  4. Concatenate the output of each of them
  5. Performs better than simple convolution
35
Q

how to specify weights/biases for convolutional net in TF

A

Truncated normal creates values from a truncated normal distribution for all values of the 3d space. width,height,input_depth,output_depth

36
Q

Explain this chart

A

They still compute a dot product of their weights with the input followed by a non-linearity, but their connectivity is now restricted to be local spatially.

The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11F=11, stride S=4S=4 and no zero padding P=0P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96]. Each of the 55*55*96 neurons in this volume was connected to a region of size [11x11x3] in the input volume. Moreover, all 96 neurons in each depth column are connected to the same [11x11x3] region of the input, but of course with different weights.

37
Q

What hyperparameters control the depth of the output volume

A

depth, stride and zero-padding

depth = number of filters we want to use

stride = how many pixels we slide over before applying filter again

zero-padding - do we want to keep the output size the same?

38
Q
A
39
Q

What is the constraint with strides when combing with other hyperparameters such as filter size and padding.

A

The output dimensions need to be integers. If when calculating the output dimensions, and an integer is not calcluated for the height/width(4.5), then either zero padding, different strides or something else could be used.

40
Q

Explain this chart

A

Shows 2 filters(set of weights) being applied to a 3d image. The depth of each filter is determined by the input image. The number of filters determines the number of output slices. Each value in the output slice(green) is a neuron which is associated to a specific input region and one of the weights.

41
Q
A
42
Q

Max Pooling in TF

A
43
Q

What does it mean to have a fully connected layer

A

You take lets say a 3d image 5X5X16 and flatten so that you have 400 neurons in a single column vector. You effectively strip the spatial connectivity. The output is the amount of classes you are trying to classify.