4/5 - Deep Convolutional Neural Networks Flashcards

1
Q

Deep learning

A

Learning on a network with more than 3 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Overfitting in deep learning (parameters)

A

More parameters increases the risk of overfitting.

Too many parameters and not sufficient data points = overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Shape analogy for layers (Abstraction)

A

Layer 1 could detect a horizontal or vertical line
Layer 2 could then detect a shape
Layer 3 could then detect an item.

Abstraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MNIST

A

Handwriting data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 AI uses in images

A

Image segmentation (item separation)
Image captioning
Question answering (is there a boat in this image?)
Action recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you choose the values in a kernel/filter?

A

Randomly produced and allow the system to learn the best filter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CNN Do you flatten the input image?

A

No, it stays as 2D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

2D Convolution Layer (5x5 input and 3x3 kernel example)

A

5x5 Input for example with a 3x3 kernel.

You slide the 3x3 kernel across the image from the top left, sum them and then set the location as that in a new 3x3 matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Stride

A

Number of steps taken when sliding the kernel

Eg slide 1 is one pixel
slide 2 is 2

You move faster through the image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Higher stride does what to output?

A

Reduces the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Padding

A

Add zeros all around edge and then scan with kernel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Computing the size of an output with convolutional:

A

(W-F + 2P)/S + 1

W input size
F filter size
S stride
P padding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Images have how many channels and why?

A

3 - RGB

(Where the image is RGB of course)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 Channel input and a 3 channel filter. How many channels is the output?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

To make a 3 channel output of a 3 channel input, we need how many 3 channel filters?

A

3 filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

More filters you have means what for parameters?

A

More parameters. Potential overfitting if too many filters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Must your number of input channels match the number of filter channels?

A

YES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Stack of convolutional layers example

A

Input
Filters
Feature Maps
Filters
Feature Maps
Filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ReLU

A

Rectified Linear Unit

max(0,x)

On graph it’s y=0 below x=0 but otherwise y=x

18
Q

When is ReLU needed?

A

After every convolutional layer.

19
Q

Why can’t we have two convolutional layers right next to each other?

A

Matrix Multiplication. Two convolutional filters one after the next would be the same as having just one.

We need non-linearity (ReLU etc, sigmoid and others…)

20
Q

Advantages of ReLU vs tanh/sigmoid

A
  • Faster convergence than tanh
  • Easier and faster calculation
  • Lower probability of vanishing gradient
21
Q

Disadvantages of ReLU:

A
  • Dying ReLU: when inputs are all or mostly negative, recovery is hard because grad is 0 in neg half
22
Q

Pooling Layer: Max Pooling

A

Takes a sub-part of the image LIKE CONVOLUTIONAL (using a scan kernel) but takes the highest value as the output (rather than multiplying and adding)

23
Q

Pooling: does it collapse the channels, like convolutional?

A

No, 3 channels in 3 channels out

24
Q

Advantages of pooling layer

A
  • Improved translational invariance (moving pixels to slightly different locations has little effect)
  • Downsampling (reduced resources)
25
Q

Batch Normalisation concept

A

Network is easier to train if the input is normalised

26
Q

Issue with normalisation

A

In training, the parameters are chaning which leads to changing of data distribution in different layers

27
Q

Batch normalisation

Input: values of x over a mini-batch B = {X1…m}
Parameters to be learned yotta

A

mini batch mean
mini batch variance
normalise
scale and shift

28
Q

Fully Connected Layer

A

Each neuron/unit in the previous layer connects to every unit in the this layer.

29
Q

If you had a 9x9 patch and compared a 4 layer approach to a 1 layer approach (ending at 1x1), how many parameters does each have?

A

1 has 4 biases (filters)
1 has 4 3 3 = 36
4 + 36 = 40 parameters

2 has 1 bias
2 has 1 9 9 = 81
1+81 = 82 parameters

Approach 2 more likely to overfit

30
Q

2 advantages of using more, small filters

A
  • Fewer parameters: lower chance of overfitting and lower computational cost
  • More ReLUs: more non-linearity so higher representation capacity
31
Q

Why does GoogleNet arch use several different filters in parallel? What is the problem with this?

A

Some filter sizes might be better in certain circumstances than others.

The problem is computational cost

32
Q

How did GoogLeNet solve the computational cost issue?

A

They use Dimension reduction with bottleneck layers.

The 1x1 convolutional layer

33
Q

Residual Block

A

Takes the input, bypasses the layers and sums it with the layer outputs.

If you’ve already learned then just ensure that f(x) does not do much to effect the input.
ResNet uses them every 2 layers.

34
Q

Densely connected Networks

A

Residual blocks but connected ot every future layer output.

35
Q

Pros of Densely connected networks

A
  • Stronger gradient in back propagation so it’s easier to train
  • Learn feautres from multiple levels
  • Reuse of features (fewer filters but same num of feature maps)
36
Q

Drawback of densely connected network

A

High memory cost. You must save all intermediate feature maps.

37
Q

Dense Block Networks

A

Uses smaller blocks of densely connected layers, separated by standard layers.

38
Q

Fine-Tuning data requirements

A

Lots of data required. You could use a pre-trained network then fine tune

39
Q

Fine tuning with less data

A

You could freeze all layers other than the last and retrain that with only a few data points.

You could also train a linear classifier.

40
Q

Fine tuning with more data

A

Freeze less layers and train more layers going backwards.

41
Q

Fine tuning with a LOT data

A

Freeze no layers and retrain all.

Be careful of overfitting

42
Q

Data augmentation

A

Add transforms like rotation/translation/zooming/noise etc)

43
Q

Dropout

A

Blackout some units every time you perform inference

Harder to overfit because network can’t rely on one neuron anymore

44
Q

Early stop

A

You can stop the network when it starts to overfit.