Deep Convolutional Neural Networks Flashcards

Notes from this lecture that may prove to be helpful in the exam

1
Q

What are some popular methods of Deep Learning?

A

Convolutional Neural Networks

Reinforcement Learning Networks

Generative Neural Networks

Recurrent Neural Networks

Graph Neural Networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is a Convolutional Neural Network structured?

A

Neurons are arranged in 3D, where each neuron is only connected by a small region of previous layers

Features are learned in a hierarchical structure, from low level features to high level features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the basic components for Convolutional Neural Networks?

A

Convolution Operator
Pooling
Activation Function
Fully connected layer
Loss Functions
Optimisation methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Convolution Operator in CNNs?

A

It is a ‘panel’ which has pre-meditated values that is placed over a pixel. The ‘panel’ specifies what the output pixel is, by multiplying each pixel’s value with the value that the ‘panel’ dictates. For example:
0 1 2 is overlayed with a panel that looks like this: 4 2 1. The result for that middle pixel is (0 * 4) + (2 * 1) + (1 * 2) = 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What steps do you need to take to be able to work out the Total Learnable Parameters in CNNs?

A
  1. Take the amount of parameters per filter e.g. RGB = 3 parameters per filter
  2. Multiply step 1’s result with the size of the filter e.g. 3 x 3 filter * 3 (RGB channel)
  3. Take step 2’s result, and multiply it by the number of filters
  4. Work out the total bias parameters, by multiplying the amount of filters by the amount of bias per filter
  5. Add step 3 and 4’s results together to get the total amount of learnable parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the main purpose of adding activation functions to anything in ML?

A

Adding activation functions introduces non-linearity to the network, so that they can learn a non-zero intercept and understand more complex relationships in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main features surrounding the Sigmoid function in CNNs?

A

It can convert the input range to [0, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main features surrounding the Tanh function in CNNs?

A

It converts the data into a zero-centered range [-1, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main features surrounding the ReLU function in CNNs?

A

It doesn’t saturate and converges faster
Simple to calculate
Some neurons can be dead with a negative input (so this is better with a smaller learning rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main features of Leaky ReLU in CNNs?

A

It overcomes the ‘dying neuron’ problem found in the other activation functions
The performance is not consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why are CNNs better than MLPs when using image data?

A

They are more efficient for image data due to the use of:
- Shared weights
- Exploit spatial hierarchies
- Requires fewer parameters
- Translation invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does Pooling do in CNNs?

A

Pooling takes a Feature Map i.e. a matrix of pixels, and downsamples it i.e. reduces its size, whilst also preserving their depth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What two types of Pooling are there, and how do they work in CNNs?

A

Max pooling - Takes the maximum value from each region/window of the feature map

Average Pooling - Computes the average of values in each region/window of the feature map.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the Fully Connected Layer effectively become for a CNN?

A

The Fully Connected Layer fundamentally becomes an MLP, where all nodes in a layer are connected to all nodes in the next layer, etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for the loss function Cross Entropy (Classification)?

A

L = - (N sigma n = 1) (C sigma k = 1) (y_kn * log (yhat_kn))
In other words:
For each data sample and for each class, the true label is multiplied by the log of the prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What optimisation methods are used on CNNs?

A

Gradient Descent
Gradient Descent with momentum
Adam
Back propagation with chain rule to optimise weight W and bias B

17
Q

What is the key feature of Gradient Descent with Momentum in optimisation?

A

Avoid the local minimum by effectively ‘sliding down’ the curve, then carrying ‘the momentum’ slightly up ‘the hill’ so as to break out of local optima.

18
Q

What is the key feature of RMSProp in Optimisation?

A

It contains an adaptive learning rate for each parameter

19
Q

What is a short explanation of Adam in regards to Optimisation?

A

It combines both RMSProp and Gradient Descent with Momentum, and is highly popular in Deep Learning

20
Q

What three optimisation strategies are commonly used with Big Data?

A

Intrinsic parameters are updated based on the averaged loss value after seeing all the training examples - Too slow to compute when there’s too many training examples

Stochastic Gradient Descent - Randomly selects one training example for gradient calculation

Mini Batch Gradient Descent - Randomly select a batch of training examples for Gradient Calculation

21
Q

What are some major applications of Convolutional Neural Networks?

A

Object Classification
Facial Recognition
Real-time object detection

22
Q

What methods are employed in CNNs to reduce overfitting?

A

Data Augmentation
Drop Out
Transfer Learning

23
Q

How does Data Augmentation work in regards to CNNs?

A

Data augmentation increases the number of training samples by varying the original image’s geometry and appearance

24
Q

When is Data Augmentation performed?

A

It is normally performed during training, but not beforehand

25
Q

How does Drop Out work in regards to CNNs?

A

It randomly disables certain neurons during the training process
This is typically within a fully connected layer with a drop out rate, such as 20%.

26
Q

How does Transfer Learning work in regards to CNNs?

A

It trains the CNN using a large dataset
Afterwards, it freezes the parameters of the first few layers, and only retrains the high-level feature layers or fully connected layers for a new application with a small number of training samples