Deep Convolutional Neural Networks Flashcards by Joshua Carey-Young

What are some popular methods of Deep Learning?

Convolutional Neural Networks

Reinforcement Learning Networks

Generative Neural Networks

Recurrent Neural Networks

Graph Neural Networks

How well did you know this?

Not at all

Perfectly

How is a Convolutional Neural Network structured?

Neurons are arranged in 3D, where each neuron is only connected by a small region of previous layers

Features are learned in a hierarchical structure, from low level features to high level features

How well did you know this?

Not at all

Perfectly

What are the basic components for Convolutional Neural Networks?

Convolution Operator
Pooling
Activation Function
Fully connected layer
Loss Functions
Optimisation methods

How well did you know this?

Not at all

Perfectly

What is the Convolution Operator in CNNs?

It is a ‘panel’ which has pre-meditated values that is placed over a pixel. The ‘panel’ specifies what the output pixel is, by multiplying each pixel’s value with the value that the ‘panel’ dictates. For example:
0 1 2 is overlayed with a panel that looks like this: 4 2 1. The result for that middle pixel is (0 * 4) + (2 * 1) + (1 * 2) = 4.

How well did you know this?

Not at all

Perfectly

What steps do you need to take to be able to work out the Total Learnable Parameters in CNNs?

Take the amount of parameters per filter e.g. RGB = 3 parameters per filter
Multiply step 1’s result with the size of the filter e.g. 3 x 3 filter * 3 (RGB channel)
Take step 2’s result, and multiply it by the number of filters
Work out the total bias parameters, by multiplying the amount of filters by the amount of bias per filter
Add step 3 and 4’s results together to get the total amount of learnable parameters

How well did you know this?

Not at all

Perfectly

What is the main purpose of adding activation functions to anything in ML?

Adding activation functions introduces non-linearity to the network, so that they can learn a non-zero intercept and understand more complex relationships in the data.

How well did you know this?

Not at all

Perfectly

What are the main features surrounding the Sigmoid function in CNNs?

It can convert the input range to [0, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1

How well did you know this?

Not at all

Perfectly

What are the main features surrounding the Tanh function in CNNs?

It converts the data into a zero-centered range [-1, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1

How well did you know this?

Not at all

Perfectly

What are the main features surrounding the ReLU function in CNNs?

It doesn’t saturate and converges faster
Simple to calculate
Some neurons can be dead with a negative input (so this is better with a smaller learning rate)

How well did you know this?

Not at all

Perfectly

What are the main features of Leaky ReLU in CNNs?

It overcomes the ‘dying neuron’ problem found in the other activation functions
The performance is not consistent

How well did you know this?

Not at all

Perfectly

Why are CNNs better than MLPs when using image data?

They are more efficient for image data due to the use of:
- Shared weights
- Exploit spatial hierarchies
- Requires fewer parameters
- Translation invariant

How well did you know this?

Not at all

Perfectly

What does Pooling do in CNNs?

Pooling takes a Feature Map i.e. a matrix of pixels, and downsamples it i.e. reduces its size, whilst also preserving their depth

How well did you know this?

Not at all

Perfectly

What two types of Pooling are there, and how do they work in CNNs?

Max pooling - Takes the maximum value from each region/window of the feature map

Average Pooling - Computes the average of values in each region/window of the feature map.

How well did you know this?

Not at all

Perfectly

What does the Fully Connected Layer effectively become for a CNN?

The Fully Connected Layer fundamentally becomes an MLP, where all nodes in a layer are connected to all nodes in the next layer, etc…

How well did you know this?

Not at all

Perfectly

What is the equation for the loss function Cross Entropy (Classification)?

L = - (N sigma n = 1) (C sigma k = 1) (y_kn * log (yhat_kn))
In other words:
For each data sample and for each class, the true label is multiplied by the log of the prediction.

How well did you know this?

Not at all

Perfectly

What optimisation methods are used on CNNs?

Study These Flashcards

Gradient Descent
Gradient Descent with momentum
Adam
Back propagation with chain rule to optimise weight W and bias B

What is the key feature of Gradient Descent with Momentum in optimisation?

Study These Flashcards

Avoid the local minimum by effectively ‘sliding down’ the curve, then carrying ‘the momentum’ slightly up ‘the hill’ so as to break out of local optima.

What is the key feature of RMSProp in Optimisation?

Study These Flashcards

It contains an adaptive learning rate for each parameter

What is a short explanation of Adam in regards to Optimisation?

Study These Flashcards

It combines both RMSProp and Gradient Descent with Momentum, and is highly popular in Deep Learning

What three optimisation strategies are commonly used with Big Data?

Study These Flashcards

Intrinsic parameters are updated based on the averaged loss value after seeing all the training examples - Too slow to compute when there’s too many training examples

Stochastic Gradient Descent - Randomly selects one training example for gradient calculation

Mini Batch Gradient Descent - Randomly select a batch of training examples for Gradient Calculation

What are some major applications of Convolutional Neural Networks?

Study These Flashcards

Object Classification
Facial Recognition
Real-time object detection

What methods are employed in CNNs to reduce overfitting?

Study These Flashcards

Data Augmentation
Drop Out
Transfer Learning

How does Data Augmentation work in regards to CNNs?

Study These Flashcards

Data augmentation increases the number of training samples by varying the original image’s geometry and appearance

When is Data Augmentation performed?

Study These Flashcards

It is normally performed during training, but not beforehand

How does Drop Out work in regards to CNNs?

It randomly disables certain neurons during the training process This is typically within a fully connected layer with a drop out rate, such as 20%.

How does Transfer Learning work in regards to CNNs?

It trains the CNN using a large dataset Afterwards, it freezes the parameters of the first few layers, and only retrains the high-level feature layers or fully connected layers for a new application with a small number of training samples

Deep Convolutional Neural Networks Flashcards

Notes from this lecture that may prove to be helpful in the exam (26 cards)