Unit 3: Convolutional Neural Networks Flashcards
Cross-correlation process
The centre of the kernel is placed at each location of the image.
Each value in the kernel is then multiplied with the value beneath and the resulting products are summed, with this value being inserted into a new image at the corresponding position.
Padding
Because the kernel can’t be centered on the boundary rows and columns of the image, there are fewer rows and columns in the output than the input.
To make the output image the same size as the input, it is common practice to expand the original image with additional rows and columns to the left and right and on the top and bottom.
This means that the kernel can be centered at all pixels locations of the original image.
The additional rows and columns are normally filled with zeros.
Stride
A variation of the procedure where the kernel is moved one or more steps at a time across the columns and down the rows.
The step size is called the stride.
Dilation
Another variation of the procedure is to interspace the values in the kernel as they are applied to the image values.
We refer to the degree of interspacing as the dilation.
spread of a Gaussian kernel
The amount of smoothing, controlled by the value of sigma.
Convolution
If we rotate the kernel by a half turn (180 degrees), before applying cross-correlation, the resulting operation is nown as convolution.
The convolution of an image f
with a kernel h
is written as h * f
for short.
Convolutional Neural Network
A ConvNet applied to images is organised as a series of layers.
Each layer uses convolution to produce a set of feature maps using a different kernel for each feature map.
The feature maps from one layer are passed as input to the next layer.
The first layer received the image as input.
Pooling
A pooling layer is used in CNNs to reduce the spatial size of the feature maps and give some invariance to small spatial transformations of the input image, which might arise from small translations or deformations.
The idea is to tile the feature maps with a fixed window and then to aggregate the values in the window at each position into a single scalar value.
The aggregate value provides a summary of the values within the window.
Max-pooling is the maximum of the values.
Data augmentation (5)
One way to avoid over-fitting is to increase the size of the training set.
Data augmentation produces new images by:
- Applying random transformations to the images in the training set.
- Simulating changes in camera position and orientation by translating.
- Rotating and scaling the image.
- Scene lighting by scaling intensity values.
- Intraclass variations in shape and appearance by applying small image deformations.
Dropout
The idea is to perturb training examples by zeroing input values to a layer with some given probability p.
We are effectively removing features at a chosen layer in the CNN.
Dropout is a form of regularisation that can reduce generalisation error and thereby improve performance on unseen data.
Batch normalisation
Training can be improved by reducing the variability of the data using batch normalisation.
Here we normalise the values in the input image by standardising over each bach of data.
Receptive field
In a CNN, the value at each position in a feature map derives from a region of values in the input image known as the receptive field for that feature map position.
Typically the receptive field grows as we move through the layers of the CNN.
Grad-Cam
A way to find out which parts of an input image contribute most to selection of a given class labe.
Semantic segmentation task
To label each and every pixel in a given image with the object class to which it belongs, including a catch-all class for the background.
Jaccard Index
Performance of a semantic segmentation process is normally measured using the Jaccard index, which is a statistic for measuring the similarity between a pair of sets, A and B.
Defined as the size of the intersection over the size of the union of the two sets.
J(A, B) = |A ∩ B| / |A ∪ B|
Aka Intersection over Union