Neural Networks Flashcards
Neural networks are a form of artificial intelligence model, which utilise interconnected layers made up of artificial neurons. The term is a broad umbrella, with numerous possible arrangements and structures (or architectures), which are suited to different tasks. Neural networks have proved a very versatile and powerful machine learning tool, providing state-of-the-art performance in a large range of settings.
Architecture
In artificial intelligence, ‘architecture’ refers to the overall structure of a network or model; the types and size of the different layers and the setup of the various connections between them. This term can be broad (referring to a range of similar structures) or specific (referring to one individual example).
To build a basic understanding of the structure and workings of neural networks, we will focus on one particular category of neural network architecture – the convolutional neural network (CNN), widely employed in tasks involving image processing and interpretation.
An overview of neural networks
Neural networks are a specific approach used in machine learning. A neural network is made up of lots of interconnected ‘nodes’ (like neurons in the human brain). The nodes usually represent ‘artificial neurons’, structures that apply mathematical functions to their input and pass it onto the next node. The nodes work in layers.
The input layer receives the raw data.
The hidden layers are where mathematical functions are applied to the data in order to process the data.
The output layer produces final predictions based on the data processed in the hidden layers.
These layers all work together to find patterns in data (like a dataset of images).
Artificial neurons
Artificial neurons are the fundamental building blocks of neural networks. They ingest a given number of inputs and perform a mathematical function to generate an output value. The nature of this mathematical function can be modified by adjusting various parameters (the weights and bias). Artificial neurons with different parameters could produce different output values given the same input values.
Inputs and bias
An artificial neuron has a defined number of numerical input values, which will typically be fed from the outputs of connected neurons in the preceding layer of the network. The example above shows three inputs. Each input has an associated weight, which it will be multiplied by before the neuron sums the inputs.
The bias value forms a further internal input into the neuron which is added regardless of the external inputs.
Activation function
he activation function is a mathematical function applied to the initial output (the sum of the inputs multiplied by their respective weights, added to the bias value) to form the final output value of the neuron. Some examples of activation functions are shown below, with the x-axis representing the initial output value, and the y-axis representing the final output value.
Without an activation function, the neuron’s output is entirely linear – scaling directly with the input values, and only capable of modelling linear relationships. This would remain the case even with the combined effort of any number of artificial neurons in any structure. As any linear relationship can be modelled perfectly with a standard algebraic formula, this would not be a very useful approach.
By introducing non-linearity (that is, the output values not directly scaling with the input values) into the system, the neuron becomes able to model complex, non-linear relationships. There are numerous options for activation functions, of varying complexities.
Output
In summation, to generate an output value for a given set of input values with this three-input artificial neuron:
All input values are multiplied by their respective weights and the resulting values are summed together
The bias is added to this sum
The activation function is applied to produce the final output value.
Parameter
In artificial intelligence, ‘parameter’ refers to a value we can modify within a model to affect future outputs. For artificial neurons, these are the input weights and bias.
When training a model using machine learning, these values are gradually modified to improve overall performance. Depending on the complexity of the model, the number of parameters might range from thousands to trillions.
To better understand neural networks and how they work, in the next lesson, you will get to grips with a specific example of a neural network architecture: a convolutional neural network (CNN).
To better understand neural networks and how they work, you will now get to grips with a specific example of a neural network architecture: a convolutional neural network (CNN).
A neural network is typically comprised of sequential layers of artificial neurons, which each process (or transform) the data and pass that information on to the next layer in the network. The design of these layers will be informed by what the model’s ‘task’ is. A convolutional neural network (CNN, or ConvNet) is a type of neural network architecture with convolutional layers, which are particularly optimised for recognising features that can occur in any portion of the input data. CNNs are one of the most successful artificial intelligence architectures for image interpretation and analysis.
- Layers in a convolutional neural network
Take a look at the graphic below, showing the layers in a simple 2 dimensional convolutional neural network.
Layer
In neural networks, ‘layer’ refers to a group of nodes (typically artificial neurons) that receive input at the same point in the network.
Layers are often specialised by structuring their connections in a particular way, or by utilising specific functions instead of using artificial neurons.
Many modern network architectures use several hidden layers to recognise increasingly complex patterns within the input data; this approach is often referred to as deep learning.
The layers within a neural network can be divided into the input’ hidden, and output layers, depending upon where they fall within the network’s structur
- Input layer
The input layer receives the raw data.
Generally, this does not utilise artificial neurons, instead just providing a consistent entry-point for data into the network.
In order to achieve this, data normally undergoes preprocessing prior to entry to the input layer, where it is transformed into a consistent format. This process often involves rescaling (‘normalising’) the data to a standard range of values.
The input layer needs to provide capacity for all data we wish to feed into the model for a single output.
- Hidden layers
The hidden layers utilise artificial neurons to process the data in a useful way, depending on the task of the network.
The size, structure, and connections between these layers varies, to optimise the ability of the layer to transform the data in a useful way.
In this lesson, we’ll discuss a few different kinds of layers that might be used within the hidden portion of a network: convolutional, pooling, and fully connected layers.
- The output layer provides the final output values.
The shape of this layer will be determined by the desired output of the network; this could be a single value or multiple.
Similarly to the input layer, the output layer often involves special formatting (such as a sigmoid, or softmax function) to normalise the final values into the required limits.
Normalisation
The term ‘normalisation’ is used to refer to several similar but distinct concepts within mathematics, statistics, and machine learning, and imaging science.
In machine learning, normalisation usually refers to the process of rescaling values to fit within a specified range – frequently 0 → 1, or -1 → 1.
This might be performed on input data to try and mitigate differences in the way it is acquired or stored – for example, values used to record signal intensity within MRI images are arbitrary, and can vary wildly from one scanner to the next, or with differing (but visually similar) phase sequences. The values within an MRI image might be rescaled by setting the minimum value to 0, and the maximum value to 1 (called ‘min-max normalisation’). In this way, the relative difference between bright and dark parts of the image would be preserved, and would be comparable in absolute difference to any other image rescaled in the same way.
A common alternative method is Z-score normalisation, which rescales the data based on its standard deviation.
Input layer
The input layer is determined by the type and format of data intended to be fed into the model in a single input, which could have any number of dimensions:
1 dimensional
A list of different blood test values, or an ECG waveform
2 dimensional
A grid (‘array’, or ‘matrix’) of values, like a grayscale image or a chest radiograph.
3 dimensional
A set of 2d image slices, or a 3d volume, like a standard MRI volume.
4+ dimensions
A series of 3d inputs across different time-points or modalities; this could be something like a functional MRI acquisition, where multiple 3d volumes of data have been captured during a continuous time period.
In more complex models, the input layer could contain multiple different types of data with varying dimensions, such as including both a 2d or 3d image as well as the patient’s age.
Array
An ‘array’ is a grid-like data structure. The above input types can all be described as types of array, and like them, arrays have a set size and number of dimensions.
The contents of an array could be any type of data, but all individual elements of the array must be of the same type. The majority of arrays used in machine learning are float arrays – arrays of numbers which allow for decimal points within the stored data.
Each piece of data within the array (‘element’) can be located using its index, which is effectively a coordinate system for the array.
The graphic below shows one, two and three dimensional float arrays.
- Hidden layers
Convolutional layers, pooling layers and fully connected layers are all examples of hidden layers, which you will now explore below.
Kernel
Kernels are arrays which are applied to part of the input (‘convolved’) to generate a single output value (which is often subsequently passed through an activation function).
The graphic below shows a simple 2d 3x3 kernel that could be used to detect vertical edges.
The convolution layer being used will usually be referred to by reference to the kernel’s dimensions – ‘5x5 convolution layer’ corresponding to a 2d kernel with a 5 wide and 5 tall array of weights. The dimensions of the layer’s kernel dictate how each neuron will connect to the preceding layer; in a 3x3 kernel layer, each neuron would connect to the outputs of a 3x3 grid of neurons in the preceding layer. Properties of a layer that are set when initially describing the network, such as kernel size, are termed ‘hyperparameters’.
Hyperparameter
In contrast to parameters, which are values modified during training based on example data, hyperparameters are properties decided upon during creation of the model. They can be broadly split into two categories: model hyperparameters and algorithm hyperparameters.
Model hyperparameters are static properties of the model, such as:
The size and type of the layers
How the layers are connected
What activation functions might be used and where.
Algorithm hyperparameters are choices and values affecting how the model is actively trained, such as:
What formula is used to calculate inaccuracy (the ‘loss function’)
How rapidly changes are made to the model’s parameters (‘learning rate’)
Which areas of the network are prioritised for parameter updates.
In reality, rather than a layer’s kernel being defined manually, it is learned during the training process. This allows the kernel to pick up on specific relevant features within the input data.
Because the kernel weights are shared by all neurons across the layer, rather than each neuron having its own individual set, the layer is very computationally efficient. It also means that the layer can spatially generalise the feature it is tuned to detect – it is applied across the entire input by the different neurons within the layer. By including further convolutional layers within a network, increasingly complex features can be identified; the first detecting simple features like edges, with subsequent layers detecting arrangements or clusters of these simpler features.
The parts of the input data that have the kernel applied can be varied depending on the layer’s settings via concepts like stride, amongst others. These factors can affect the size of the feature map, due to the kernel being applied more or fewer times across the input data.
Stride (Part 1)
Stride is the overlap or spacing between adjacent applications of the convolution kernel within the input data. The ‘stride’ in the example above is 1 – the area in which adjacent kernels are applied differs by a single row and column.
The diagram below shows the difference between a stride of 1 and a stride of 2 – illustrating how the first and second neurons within the convolutional layer would have inputs further apart with a higher stride value.
Stride (Part 2)
If the stride is larger than the size of the filter, there may be areas of the input data that are not involved in any convolution!
A higher stride value reduces the computational complexity of the model, as fewer calculations need to be performed.
Note that due to stride affecting the number of times a kernel is applied to the input layer, it impacts the overall size of the feature map:
In this case, the increased stride has reduced the ability of the kernel to detect the relatively small features within the input data. This could potentially be mitigated by using a larger kernel (i.e., 5x5).