Deep Convolutional Neural Networks Flashcards
Notes from this lecture that may prove to be helpful in the exam
What are some popular methods of Deep Learning?
Convolutional Neural Networks
Reinforcement Learning Networks
Generative Neural Networks
Recurrent Neural Networks
Graph Neural Networks
How is a Convolutional Neural Network structured?
Neurons are arranged in 3D, where each neuron is only connected by a small region of previous layers
Features are learned in a hierarchical structure, from low level features to high level features
What are the basic components for Convolutional Neural Networks?
Convolution Operator
Pooling
Activation Function
Fully connected layer
Loss Functions
Optimisation methods
What is the Convolution Operator in CNNs?
It is a ‘panel’ which has pre-meditated values that is placed over a pixel. The ‘panel’ specifies what the output pixel is, by multiplying each pixel’s value with the value that the ‘panel’ dictates. For example:
0 1 2 is overlayed with a panel that looks like this: 4 2 1. The result for that middle pixel is (0 * 4) + (2 * 1) + (1 * 2) = 4.
What steps do you need to take to be able to work out the Total Learnable Parameters in CNNs?
- Take the amount of parameters per filter e.g. RGB = 3 parameters per filter
- Multiply step 1’s result with the size of the filter e.g. 3 x 3 filter * 3 (RGB channel)
- Take step 2’s result, and multiply it by the number of filters
- Work out the total bias parameters, by multiplying the amount of filters by the amount of bias per filter
- Add step 3 and 4’s results together to get the total amount of learnable parameters
What is the main purpose of adding activation functions to anything in ML?
Adding activation functions introduces non-linearity to the network, so that they can learn a non-zero intercept and understand more complex relationships in the data.
What are the main features surrounding the Sigmoid function in CNNs?
It can convert the input range to [0, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1
What are the main features surrounding the Tanh function in CNNs?
It converts the data into a zero-centered range [-1, 1]
It saturates and kills gradients with extremely small gradients at regions of 0 and 1
What are the main features surrounding the ReLU function in CNNs?
It doesn’t saturate and converges faster
Simple to calculate
Some neurons can be dead with a negative input (so this is better with a smaller learning rate)
What are the main features of Leaky ReLU in CNNs?
It overcomes the ‘dying neuron’ problem found in the other activation functions
The performance is not consistent
Why are CNNs better than MLPs when using image data?
They are more efficient for image data due to the use of:
- Shared weights
- Exploit spatial hierarchies
- Requires fewer parameters
- Translation invariant
What does Pooling do in CNNs?
Pooling takes a Feature Map i.e. a matrix of pixels, and downsamples it i.e. reduces its size, whilst also preserving their depth
What two types of Pooling are there, and how do they work in CNNs?
Max pooling - Takes the maximum value from each region/window of the feature map
Average Pooling - Computes the average of values in each region/window of the feature map.
What does the Fully Connected Layer effectively become for a CNN?
The Fully Connected Layer fundamentally becomes an MLP, where all nodes in a layer are connected to all nodes in the next layer, etc…
What is the equation for the loss function Cross Entropy (Classification)?
L = - (N sigma n = 1) (C sigma k = 1) (y_kn * log (yhat_kn))
In other words:
For each data sample and for each class, the true label is multiplied by the log of the prediction.
What optimisation methods are used on CNNs?
Gradient Descent
Gradient Descent with momentum
Adam
Back propagation with chain rule to optimise weight W and bias B
What is the key feature of Gradient Descent with Momentum in optimisation?
Avoid the local minimum by effectively ‘sliding down’ the curve, then carrying ‘the momentum’ slightly up ‘the hill’ so as to break out of local optima.
What is the key feature of RMSProp in Optimisation?
It contains an adaptive learning rate for each parameter
What is a short explanation of Adam in regards to Optimisation?
It combines both RMSProp and Gradient Descent with Momentum, and is highly popular in Deep Learning
What three optimisation strategies are commonly used with Big Data?
Intrinsic parameters are updated based on the averaged loss value after seeing all the training examples - Too slow to compute when there’s too many training examples
Stochastic Gradient Descent - Randomly selects one training example for gradient calculation
Mini Batch Gradient Descent - Randomly select a batch of training examples for Gradient Calculation
What are some major applications of Convolutional Neural Networks?
Object Classification
Facial Recognition
Real-time object detection
What methods are employed in CNNs to reduce overfitting?
Data Augmentation
Drop Out
Transfer Learning
How does Data Augmentation work in regards to CNNs?
Data augmentation increases the number of training samples by varying the original image’s geometry and appearance
When is Data Augmentation performed?
It is normally performed during training, but not beforehand
How does Drop Out work in regards to CNNs?
It randomly disables certain neurons during the training process
This is typically within a fully connected layer with a drop out rate, such as 20%.
How does Transfer Learning work in regards to CNNs?
It trains the CNN using a large dataset
Afterwards, it freezes the parameters of the first few layers, and only retrains the high-level feature layers or fully connected layers for a new application with a small number of training samples