4/5 - Deep Convolutional Neural Networks Flashcards by Josh Davidson

Deep learning

Learning on a network with more than 3 layers

How well did you know this?

Not at all

Perfectly

Overfitting in deep learning (parameters)

More parameters increases the risk of overfitting.

Too many parameters and not sufficient data points = overfitting

How well did you know this?

Not at all

Perfectly

Shape analogy for layers (Abstraction)

Layer 1 could detect a horizontal or vertical line
Layer 2 could then detect a shape
Layer 3 could then detect an item.

Abstraction

How well did you know this?

Not at all

Perfectly

MNIST

Handwriting data set

How well did you know this?

Not at all

Perfectly

4 AI uses in images

Image segmentation (item separation)
Image captioning
Question answering (is there a boat in this image?)
Action recognition

How well did you know this?

Not at all

Perfectly

How do you choose the values in a kernel/filter?

Randomly produced and allow the system to learn the best filter

How well did you know this?

Not at all

Perfectly

CNN Do you flatten the input image?

No, it stays as 2D

How well did you know this?

Not at all

Perfectly

2D Convolution Layer (5x5 input and 3x3 kernel example)

5x5 Input for example with a 3x3 kernel.

You slide the 3x3 kernel across the image from the top left, sum them and then set the location as that in a new 3x3 matrix

How well did you know this?

Not at all

Perfectly

Stride

Number of steps taken when sliding the kernel

Eg slide 1 is one pixel
slide 2 is 2

You move faster through the image.

How well did you know this?

Not at all

Perfectly

Higher stride does what to output?

Reduces the output

How well did you know this?

Not at all

Perfectly

Padding

Add zeros all around edge and then scan with kernel

How well did you know this?

Not at all

Perfectly

Computing the size of an output with convolutional:

(W-F + 2P)/S + 1

W input size
F filter size
S stride
P padding

How well did you know this?

Not at all

Perfectly

Images have how many channels and why?

3 - RGB

(Where the image is RGB of course)

How well did you know this?

Not at all

Perfectly

3 Channel input and a 3 channel filter. How many channels is the output?

How well did you know this?

Not at all

Perfectly

To make a 3 channel output of a 3 channel input, we need how many 3 channel filters?

3 filters

How well did you know this?

Not at all

Perfectly

More filters you have means what for parameters?

More parameters. Potential overfitting if too many filters.

How well did you know this?

Not at all

Perfectly

Must your number of input channels match the number of filter channels?

YES

How well did you know this?

Not at all

Perfectly

Stack of convolutional layers example

Input
Filters
Feature Maps
Filters
Feature Maps
Filters

How well did you know this?

Not at all

Perfectly

ReLU

Study These Flashcards

Rectified Linear Unit

max(0,x)

On graph it’s y=0 below x=0 but otherwise y=x

When is ReLU needed?

Study These Flashcards

After every convolutional layer.

Why can’t we have two convolutional layers right next to each other?

Study These Flashcards

Matrix Multiplication. Two convolutional filters one after the next would be the same as having just one.

We need non-linearity (ReLU etc, sigmoid and others…)

Advantages of ReLU vs tanh/sigmoid

Study These Flashcards

Faster convergence than tanh
Easier and faster calculation
Lower probability of vanishing gradient

Disadvantages of ReLU:

Study These Flashcards

Dying ReLU: when inputs are all or mostly negative, recovery is hard because grad is 0 in neg half

Pooling Layer: Max Pooling

Study These Flashcards

Takes a sub-part of the image LIKE CONVOLUTIONAL (using a scan kernel) but takes the highest value as the output (rather than multiplying and adding)

Pooling: does it collapse the channels, like convolutional?

No, 3 channels in 3 channels out

Advantages of pooling layer

- Improved translational invariance (moving pixels to slightly different locations has little effect) - Downsampling (reduced resources)

Batch Normalisation concept

Network is easier to train if the input is normalised

Issue with normalisation

In training, the parameters are chaning which leads to changing of data distribution in different layers

Batch normalisation Input: values of x over a mini-batch B = {X1...m} Parameters to be learned yotta

mini batch mean mini batch variance normalise scale and shift

Fully Connected Layer

Each neuron/unit in the previous layer connects to every unit in the this layer.

If you had a 9x9 patch and compared a 4 layer approach to a 1 layer approach (ending at 1x1), how many parameters does each have?

1 has 4 biases (filters) 1 has 4 3 3 = 36 4 + 36 = 40 parameters 2 has 1 bias 2 has 1 9 9 = 81 1+81 = 82 parameters Approach 2 more likely to overfit

2 advantages of using more, small filters

- Fewer parameters: lower chance of overfitting and lower computational cost - More ReLUs: more non-linearity so higher representation capacity

Why does GoogleNet arch use several different filters in parallel? What is the problem with this?

Some filter sizes might be better in certain circumstances than others. The problem is computational cost

How did GoogLeNet solve the computational cost issue?

They use Dimension reduction with bottleneck layers. The 1x1 convolutional layer

Residual Block

Takes the input, bypasses the layers and sums it with the layer outputs. If you've already learned then just ensure that f(x) does not do much to effect the input. ResNet uses them every 2 layers.

Densely connected Networks

Residual blocks but connected ot every future layer output.

Pros of Densely connected networks

- Stronger gradient in back propagation so it's easier to train - Learn feautres from multiple levels - Reuse of features (fewer filters but same num of feature maps)

Drawback of densely connected network

High memory cost. You must save all intermediate feature maps.

Dense Block Networks

Uses smaller blocks of densely connected layers, separated by standard layers.

Fine-Tuning data requirements

Lots of data required. You could use a pre-trained network then fine tune

Fine tuning with less data

You could freeze all layers other than the last and retrain that with only a few data points. You could also train a linear classifier.

Fine tuning with more data

Freeze less layers and train more layers going backwards.

Fine tuning with a LOT data

Freeze no layers and retrain all. Be careful of overfitting

Data augmentation

Add transforms like rotation/translation/zooming/noise etc)

Dropout

Blackout some units every time you perform inference Harder to overfit because network can't rely on one neuron anymore

Early stop

You can stop the network when it starts to overfit.

4/5 - Deep Convolutional Neural Networks Flashcards

(46 cards)