Neural Networks Flashcards

1
Q

Neural networks are a form of artificial intelligence model, which utilise interconnected layers made up of artificial neurons. The term is a broad umbrella, with numerous possible arrangements and structures (or architectures), which are suited to different tasks. Neural networks have proved a very versatile and powerful machine learning tool, providing state-of-the-art performance in a large range of settings.

A

Architecture
In artificial intelligence, ‘architecture’ refers to the overall structure of a network or model; the types and size of the different layers and the setup of the various connections between them. This term can be broad (referring to a range of similar structures) or specific (referring to one individual example).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

To build a basic understanding of the structure and workings of neural networks, we will focus on one particular category of neural network architecture – the convolutional neural network (CNN), widely employed in tasks involving image processing and interpretation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An overview of neural networks

Neural networks are a specific approach used in machine learning. A neural network is made up of lots of interconnected ‘nodes’ (like neurons in the human brain). The nodes usually represent ‘artificial neurons’, structures that apply mathematical functions to their input and pass it onto the next node. The nodes work in layers.

The input layer receives the raw data.

The hidden layers are where mathematical functions are applied to the data in order to process the data.

The output layer produces final predictions based on the data processed in the hidden layers.

These layers all work together to find patterns in data (like a dataset of images).

A

Artificial neurons

Artificial neurons are the fundamental building blocks of neural networks. They ingest a given number of inputs and perform a mathematical function to generate an output value. The nature of this mathematical function can be modified by adjusting various parameters (the weights and bias). Artificial neurons with different parameters could produce different output values given the same input values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inputs and bias

An artificial neuron has a defined number of numerical input values, which will typically be fed from the outputs of connected neurons in the preceding layer of the network. The example above shows three inputs. Each input has an associated weight, which it will be multiplied by before the neuron sums the inputs.

The bias value forms a further internal input into the neuron which is added regardless of the external inputs.

A

Activation function

he activation function is a mathematical function applied to the initial output (the sum of the inputs multiplied by their respective weights, added to the bias value) to form the final output value of the neuron. Some examples of activation functions are shown below, with the x-axis representing the initial output value, and the y-axis representing the final output value.

Without an activation function, the neuron’s output is entirely linear – scaling directly with the input values, and only capable of modelling linear relationships. This would remain the case even with the combined effort of any number of artificial neurons in any structure. As any linear relationship can be modelled perfectly with a standard algebraic formula, this would not be a very useful approach.

By introducing non-linearity (that is, the output values not directly scaling with the input values) into the system, the neuron becomes able to model complex, non-linear relationships. There are numerous options for activation functions, of varying complexities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Output

In summation, to generate an output value for a given set of input values with this three-input artificial neuron:

All input values are multiplied by their respective weights and the resulting values are summed together

The bias is added to this sum

The activation function is applied to produce the final output value.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Parameter

In artificial intelligence, ‘parameter’ refers to a value we can modify within a model to affect future outputs. For artificial neurons, these are the input weights and bias.

When training a model using machine learning, these values are gradually modified to improve overall performance. Depending on the complexity of the model, the number of parameters might range from thousands to trillions.

A

To better understand neural networks and how they work, in the next lesson, you will get to grips with a specific example of a neural network architecture: a convolutional neural network (CNN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To better understand neural networks and how they work, you will now get to grips with a specific example of a neural network architecture: a convolutional neural network (CNN).

A

A neural network is typically comprised of sequential layers of artificial neurons, which each process (or transform) the data and pass that information on to the next layer in the network. The design of these layers will be informed by what the model’s ‘task’ is. A convolutional neural network (CNN, or ConvNet) is a type of neural network architecture with convolutional layers, which are particularly optimised for recognising features that can occur in any portion of the input data. CNNs are one of the most successful artificial intelligence architectures for image interpretation and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Layers in a convolutional neural network

Take a look at the graphic below, showing the layers in a simple 2 dimensional convolutional neural network.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Layer
In neural networks, ‘layer’ refers to a group of nodes (typically artificial neurons) that receive input at the same point in the network.

Layers are often specialised by structuring their connections in a particular way, or by utilising specific functions instead of using artificial neurons.

Many modern network architectures use several hidden layers to recognise increasingly complex patterns within the input data; this approach is often referred to as deep learning.

The layers within a neural network can be divided into the input’ hidden, and output layers, depending upon where they fall within the network’s structur

A
  1. Input layer

The input layer receives the raw data.

Generally, this does not utilise artificial neurons, instead just providing a consistent entry-point for data into the network.

In order to achieve this, data normally undergoes preprocessing prior to entry to the input layer, where it is transformed into a consistent format. This process often involves rescaling (‘normalising’) the data to a standard range of values.

The input layer needs to provide capacity for all data we wish to feed into the model for a single output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Hidden layers

The hidden layers utilise artificial neurons to process the data in a useful way, depending on the task of the network.

The size, structure, and connections between these layers varies, to optimise the ability of the layer to transform the data in a useful way.

In this lesson, we’ll discuss a few different kinds of layers that might be used within the hidden portion of a network: convolutional, pooling, and fully connected layers.

A
  1. The output layer provides the final output values.

The shape of this layer will be determined by the desired output of the network; this could be a single value or multiple.

Similarly to the input layer, the output layer often involves special formatting (such as a sigmoid, or softmax function) to normalise the final values into the required limits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normalisation
The term ‘normalisation’ is used to refer to several similar but distinct concepts within mathematics, statistics, and machine learning, and imaging science.

In machine learning, normalisation usually refers to the process of rescaling values to fit within a specified range – frequently 0 → 1, or -1 → 1.

A

This might be performed on input data to try and mitigate differences in the way it is acquired or stored – for example, values used to record signal intensity within MRI images are arbitrary, and can vary wildly from one scanner to the next, or with differing (but visually similar) phase sequences. The values within an MRI image might be rescaled by setting the minimum value to 0, and the maximum value to 1 (called ‘min-max normalisation’). In this way, the relative difference between bright and dark parts of the image would be preserved, and would be comparable in absolute difference to any other image rescaled in the same way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A common alternative method is Z-score normalisation, which rescales the data based on its standard deviation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Input layer

The input layer is determined by the type and format of data intended to be fed into the model in a single input, which could have any number of dimensions:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1 dimensional

A list of different blood test values, or an ECG waveform

A

2 dimensional

A grid (‘array’, or ‘matrix’) of values, like a grayscale image or a chest radiograph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

3 dimensional

A set of 2d image slices, or a 3d volume, like a standard MRI volume.

A

4+ dimensions

A series of 3d inputs across different time-points or modalities; this could be something like a functional MRI acquisition, where multiple 3d volumes of data have been captured during a continuous time period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In more complex models, the input layer could contain multiple different types of data with varying dimensions, such as including both a 2d or 3d image as well as the patient’s age.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Array
An ‘array’ is a grid-like data structure. The above input types can all be described as types of array, and like them, arrays have a set size and number of dimensions.

The contents of an array could be any type of data, but all individual elements of the array must be of the same type. The majority of arrays used in machine learning are float arrays – arrays of numbers which allow for decimal points within the stored data.

Each piece of data within the array (‘element’) can be located using its index, which is effectively a coordinate system for the array.

The graphic below shows one, two and three dimensional float arrays.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. Hidden layers

Convolutional layers, pooling layers and fully connected layers are all examples of hidden layers, which you will now explore below.

A

Kernel
Kernels are arrays which are applied to part of the input (‘convolved’) to generate a single output value (which is often subsequently passed through an activation function).

The graphic below shows a simple 2d 3x3 kernel that could be used to detect vertical edges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The convolution layer being used will usually be referred to by reference to the kernel’s dimensions – ‘5x5 convolution layer’ corresponding to a 2d kernel with a 5 wide and 5 tall array of weights. The dimensions of the layer’s kernel dictate how each neuron will connect to the preceding layer; in a 3x3 kernel layer, each neuron would connect to the outputs of a 3x3 grid of neurons in the preceding layer. Properties of a layer that are set when initially describing the network, such as kernel size, are termed ‘hyperparameters’.

A

Hyperparameter
In contrast to parameters, which are values modified during training based on example data, hyperparameters are properties decided upon during creation of the model. They can be broadly split into two categories: model hyperparameters and algorithm hyperparameters.

Model hyperparameters are static properties of the model, such as:

The size and type of the layers

How the layers are connected

What activation functions might be used and where.

Algorithm hyperparameters are choices and values affecting how the model is actively trained, such as:

What formula is used to calculate inaccuracy (the ‘loss function’)

How rapidly changes are made to the model’s parameters (‘learning rate’)

Which areas of the network are prioritised for parameter updates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In reality, rather than a layer’s kernel being defined manually, it is learned during the training process. This allows the kernel to pick up on specific relevant features within the input data.

Because the kernel weights are shared by all neurons across the layer, rather than each neuron having its own individual set, the layer is very computationally efficient. It also means that the layer can spatially generalise the feature it is tuned to detect – it is applied across the entire input by the different neurons within the layer. By including further convolutional layers within a network, increasingly complex features can be identified; the first detecting simple features like edges, with subsequent layers detecting arrangements or clusters of these simpler features.

A

The parts of the input data that have the kernel applied can be varied depending on the layer’s settings via concepts like stride, amongst others. These factors can affect the size of the feature map, due to the kernel being applied more or fewer times across the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Stride (Part 1)

Stride is the overlap or spacing between adjacent applications of the convolution kernel within the input data. The ‘stride’ in the example above is 1 – the area in which adjacent kernels are applied differs by a single row and column.

The diagram below shows the difference between a stride of 1 and a stride of 2 – illustrating how the first and second neurons within the convolutional layer would have inputs further apart with a higher stride value.

A

Stride (Part 2)

If the stride is larger than the size of the filter, there may be areas of the input data that are not involved in any convolution!

A higher stride value reduces the computational complexity of the model, as fewer calculations need to be performed.

Note that due to stride affecting the number of times a kernel is applied to the input layer, it impacts the overall size of the feature map:

In this case, the increased stride has reduced the ability of the kernel to detect the relatively small features within the input data. This could potentially be mitigated by using a larger kernel (i.e., 5x5).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

In practice, convolutional layers will typically apply several different kernels to the input data to produce several feature maps which are then ‘stacked’ and passed onto the next layer. This concept is complex, but the important result is that each convolutional layer can detect several different kinds of features, meaning that the later layers can draw on diverse information from the input data.

A
24
Q

3.2 Pooling layers

Pooling layers are typically placed immediately after convolution layers, receiving the feature map as an input. Pooling layers function to directly reduce the size of the feature map, aiming to reduce complexity whilst still retaining salient features. Pooling layers do not use any weights, biases, or kernels – they have no trainable parameters, so do not change during the training process. There are few types of pooling layers, but fundamentally all of these types have nodes which do the same thing – perform an operation on a subsection of the input data and reduce it to a single output.

A

The most commonly used type of pooling is max pooling – the output of a given node is the maximum value found within all of its inputs. A max pooling operation is shown in the diagram above. You could consider maximum intensity projections (MIP), as commonly used in angiographic imaging, to be a kind of max pooling. Another kind is average pooling, where the output is an average of all values in the input area.

Global pooling is a subset of pooling where the input is an entire feature map. Global pooling can help the network identify or compare between spatially general features (diffuse features – like generalised cerebral atrophy, as opposed to more focal ones like individual lesions.)

By reducing the number of elements within the network’s feature map, pooling layers help to restrict the network into learning generalisable patterns within the data, rather than learning patterns or values specific to individual examples within the training data (‘overfitting’). This also reduces the computational complexity of subsequent parts of the network.

25
Q

3.3 Fully connected (or ‘dense’) layers

Unlike the other types of layer we’ve described, neurons within the fully connected layers do not have spatially limited input; all neurons within the layer are connected to all neurons within the preceding layer. As a result, these layers are able to capture global patterns and relationships within the input data.

They also serve as a flexible means of transforming the features recognised by earlier parts of the network into values that are useful and specific to the particular task being performed.

A
26
Q
  1. Output layer

The output layer structure is determined by the desired function of the model – that model’s task. The output layer needs to contain the correct number and type of nodes to feed out all the desired output data in the right format.

In the next lesson, you will consider various tasks that a model might be used to perform, and how the output layer might be specialised to provide the desired information. Before moving on, test your knowledge with the question below.

A
27
Q

Tasks are categories of functions that a model might be designed to perform. To be suitable for a particular task, a model’s layers (particularly the input and output layers) will need to be built in such a way that the relevant data can be input and extracted to achieve the task. In this lesson, you will examine a variety of different kinds of task, and how their output layers might be specialised.

A

Classification tasks can be divided into binary classification, where the model needs to categorise the input into one of two predefined classes, multiclass classification, where there are three or more classes, or multilabel classification, where there are three or more classes, which might potentially overlap. Segmentation tasks are are particular kind of classification task most frequently used with image data. Regression tasks aim to predict a continuous numerical output value for a given input.

You will learn more about these five categories of tasks in this lesson.

28
Q
  1. Binary classification
A

Binary classification uses a single value output layer with an activation function that limits outputs to between 0 and 1, such as a sigmoid function. The network can then be trained in a way where features from one class will cause the output to tend towards 0, while the other class will cause it to tend towards 1. The model will always output a single value between these two figures. Intuitively, you might think that anything below 0.5 should be classified to the first class, with the other higher values being classified to the second class, but this would not necessarily result in optimal performance depending on the use-case. This dividing value is called the classification threshold.

29
Q

Classification threshold
Typical binary classification models will always output a single value between 0 and 1, with each extreme being associated with one of the two classes the model has been trained to distinguish.

During training, the model is being optimised to minimise the difference between its current output for a given piece of test data and the ‘correct’ value (0 or 1) for that case. The standard training process does not consider the classification threshold, or whether the current output value for a case would be ‘correct’ by that measure – it just seeks to minimise the difference between the current and ‘correct’ output value.

Once a model has been trained, a test dataset can be used to help inform the choice of threshold. This process is sometimes called threshold optimisation. It involves using the test dataset to compare the sensitivity and specificity across the entire range of possible threshold values.

A

Consider a model trained to detect acute haemorrhage on CT Head:
0 → No haemorrhage

1 → Acute haemorrhage

A threshold of 0.0 would flag every input as acute haemorrhage; it would be entirely sensitive (picking up all positive cases) but completely non-specific (flagging all negative cases also).

A threshold of 1.0 would be entirely specific (giving no false-positives) but completely insensitive (picking up none of the positive cases).

The sensitivity and specificity will both vary between these two extremes as the threshold value is changed. Depending on the use-case of the model, it could be optimised to achieve a target sensitivity or specificity based on the test dataset. In this way it is possible for any model to achieve either a specific sensitivity or specificity value (but not both concurrently!) in populations that test dataset is representative of.

30
Q

Having more than two potential classes for a single piece of input data is complicated by the possibility that classes might overlap (think of classes like consolidation and pneumothorax on a chest radiograph model – both could be present concurrently). The output layer approach will be different depending on whether overlap is possible.

A
31
Q
  1. Multilabel classification

Multilabel classification refers to where overlap is possible. The output layer will typically be a series of standard sigmoid function neurons, similar to in binary classification, with 1 neuron present for each class within the possible outputs. Each neuron is then trained such that cases negative for that particular class should tend towards 0, whilst positive cases should tend towards 1, with a classification threshold for each neuron utilised when the model is used to make actual predictions.

A
32
Q
  1. Multiclass classification

Multiclass classification is the converse; the classes have no overlap, and each case will have a single ‘correct’ class – say, if a network was being trained to identify the sequence type from an MR image. Typically a softmax function will be used. This function takes all final layer outputs and normalises them such that their sum is equal to 1.0; the final output of the model is then the class with the highest output value.

A
33
Q

Softmax function
The softmax function takes the initial outputs of all of the output layer neurons and performs a complex (non-linear) rescaling, returning a new value for each output neuron where all neuron’s final outputs sum to 1.0.

The underlying mechanics of this rescaling are mathematically quite dense, but effectively softmax accentuates the largest absolute differences within the input data. This has benefits during the training process, where it assists with stabilising the pace of the model’s improvement.

It can be tempting to think of the softmax outputs as a kind of percentage certainty of the model, but this is not actually the case. Higher values will result from the input data closely matching the features the model has learned to associate with a given class, but ultimately the model is not being trained to provide percentage values.

There are techniques that serve to calibrate these outputs to provide a better idea of how ‘certain’ the model is about predictions, but these methods do not improve the models performance, only its interpretability

A
34
Q
  1. Segmentation tasks

Segmentation tasks are technically speaking a subset of classification – division of data into particular groups. The term is most commonly applied in reference to image segmentation, where the task is to categorise each pixel or voxel within an image or volume. Models performing segmentation tasks could be used to measure tumour sizes, or assess cardiac morphology based on MR.

Particular types of CNNs are used to perform image segmentation tasks, the most popular of these being the U-Net.

Segmentation models:

Have output layers mimicking the input layers – such that a single output value is produced for each input value fed in

Tend not to utilise dense connections, as they aim to retain spatial information about the input values throughout the entire network

Often use up-sampling convolution layers, which increase the resolution of the feature map – effectively functioning like a typical convolution layer in reverse

A

Sometimes utilise skip connections where outputs from earlier layers are sent both directly to the final layers as well as the intermediate ones. This helps preserve information learned from finer features in the input data, whilst also enabling more global features to be identified by serial convolution.

The broad function of the layers within these models tends to be that the first half of the network gradually reduces the feature map resolution into the most important features, with the second half of the network then using this feature map via up-sampling to create a final output equal in resolution to the input layer.

The kinds of functions applied to the output layer for segmentation tasks are similar to those for standard classification networks, depending on whether each output could belong to only one class or many.

35
Q
  1. Regression tasks

Regression tasks aim to predict a continuous numerical output value for a given input. This could be something like patient age based on imaging appearances. The number of neurons within this final layer is equal to the number of values the model is designed to predict for a given input.

As a neuron’s native output is already a continuous numerical value, generally no activation function is applied to the output layer in these cases, with the neuron being trained to directly output the correct value for each case.

A
36
Q

Backpropagation and Optimisation

Typically CNNs are trained by supervised learning. As we learned in Section 2, supervised learning involves a training dataset composed of input data with known, correct output values.

A

During the training process, a ‘batch’ of input data is fed into the network. The resulting output data is then compared to the ‘correct’ outputs for each case within the batch. The difference between the current model output, and the actual correct output, is the model’s current error, or loss. Using backpropagation, the parameters contributing to this loss from earlier layers in the network can be identified.

37
Q

Once the loss throughout the network has been calculated and attributed, an optimizer function is employed to update the model’s parameters, with the aim of improving performance. The optimizer function makes decisions about which parameters to update, and how large a change to make in each case. One important hyperparameter of the optimizer function is the learning rate, which affects how large a change is made to parameters.

A
38
Q

Learning rate
The ‘learning rate’ is an optimizer hyperparameter relating to how drastically the optimizer will change the model’s weights, biases, and kernels during a single training ‘batch’.

Intuitively, it might seem that a higher learning rate will surely be better, but things are rarely so easy in data science.

The objective of training is for the generalisable (that is, applicable to data outside of the training set) patterns within the data to be learned by the network. There is a danger, however, that rather than internalising features and patterns that could be recognised in other cases, the network overfits to the specific data contained within the current training batch. A higher learning rate increases the likelihood of overfitting, as the model is very rapidly updated to minimise the loss within the current batch of data.

Conversely, an overly low training rate will slow down the process of training

A
39
Q

Welcome to Section 4: Artificial Intelligence Applications in Radiology and Healthcare.

Now that you have a solid understanding of AI and related areas like machine learning, neural networks, and deep learning, you are ready to learn more about specific applications of AI in radiology and healthcare.

You will first learn about the many varied ways in which AI can be applied in radiology and healthcare before a quick note on data mining and analytics.

Then you will start on the core content of this section: specific applications including how Generative Adversarial Networks (GANs) can assist radiologists, as well as looking at how AI can assist with chest x-ray classification and lung nodule detection.

In the final part of this section, you will dive deep into a popular topic in medical imaging data research: radiomics. Up next, start with a more general introduction to AI applications in radiology and healthcare.

A
40
Q

Uses of AI inradiology

A

this next part of Section 4, you will learn about three specific applications of AI in radiology and healthcare:

1Generative Adversarial Networks (GANs)

2 Image classification

3 Radiomics

These methods, when utilised effectively, show potential to have a huge impact on the way in which healthcare services are delivered for patients by creating more time for clinicians and by finding patterns in data to make diagnoses quickly.

41
Q
  1. Generative Adversarial Networks

Generative Adversarial Networks (GANs) consist of two neural networks that compete with one another. The generator generates new cases (fake cases). The discriminator differentiates between the fake cases and real-world cases. The discriminator and generator are then updated to get better at their tasks. The fake (or synthetic) data that GANs create are useful because they help the neural network to improve. It is also useful when there are privacy concerns or when there is a lack of real data.

A
42
Q

The use of GANs for image reconstruction and denoising
GANs can be applied to image denoising and reconstruction tasks.

In the example shown here, an uNet based generator is applied to an undersampled MRI to amplify its feature to construct a “generated image”. This generated image is then passed to the discriminator alongside the ground truth image in terms of their pixel (comparing corresponding pixels in the real and generated images), k-space (comparing patterns and structures in the images) and perceptual loss data (comparing features like edges and textures).

The discriminator then attempts to identify the generated image based on this information.

Both the generator and discriminator are updated/trained so that, ultimately, the performance of the discriminator falls to 50% meaning it can no longer distinguish the generator image from the ground truth.

Although this might sound counterintuitive, this is actually positive, as it means that the distinguisher can no longer distinguish between the real and fake images because the generator has learned to produce fake images that are indistinguishable from real images.

Note: FFT (see graphic) refers to fast Fourier transform(opens in a new tab).2,3

A
43
Q
  1. Image classification

Image classification in radiology is a great example of the benefits of AI use in healthcare. AI models can be utilised to accurately and efficiently identify patterns (and therefore features like abnormalities) in data, which can lead to faster diagnosis as well as better outcomes for patients, radiologists and other healthcare professionals.

As you learnt about in Section 2, machine learning models can be taught using supervised or unsupervised labelling.

Take a look at the graphic below for a visual representation of supervised learning.

A
44
Q

2.1 Chest x-ray classification

When it comes to chest x-ray classification, models are trained by providing labelled training data with different labels for each CXR. The example labels here are normal, pneumonia or COVID-19. The AI output for the provided CXR is ‘Pneumonia’. The AI model makes this classification decision based on its training.

This also demonstrates a limitation of classification models. As radiologists, we know that ‘pneumonia’ and ‘covid-19’ can look identical on CXR. Therefore, making two separate labels may not be the best approach to training as there is overlap between the two.

A
45
Q

2.2 Lung nodule classification

Lung nodule classification5

Lung nodule classification is another well-studied topic in AI.

This task is accomplished by the use of BTS/Fleischer guidelines6(opens in a new tab), as well as Brock and Herder risk models in clinical practice(opens in a new tab).7

AI has the potential to improve this pathway by reducing the number of follow-up scans, biopsies, and/or lung resections needed.

A
46
Q

2.3 Stroke detection

The example shown here is of an ischaemic stroke diagnosed via urgent initial CT head scan. An ASPECT score(opens in a new tab) has been provided.8

The AI model:

Defines a narrow, specific task

Describes the intended clinical role

Describes the patient population

Describes the outputs.

This helps to streamline the diagnostic process, thus benefiting both clinician and patient.

A
47
Q

2.4 Image segmentation: nnUNet

At the time of publication of this resource, the best-performing AI method for image segmentation is nnUNet(opens in a new tab), which is an out-of-box deep learning solution with proven performance in various biomedical imaging applications.

Key facts:

It has won multiple international image segmentation grand challenges.

Various model architecture configurations (2D, 3D low resolution, 3D full resolution, 3D ensemble, etc.) are available.

Select the arrows to view both images below.

A
48
Q

3.1 Origin of concept

Radiomics was conceptualised in the 1970s using satellite images that were classified based on image texture or regional heterogeneity.

In 1973, Robert Haralick was able to use AI to automatically classify parts of satellite images that represented swamp, marsh or urban areas. The computation of the image texture was achieved by the counting of neighbouring pixel combinations, which were then assigned a new matrix called right neighbour Grey Level Co-occurrence Matrix (GLCM) (where order of the neighbouring pixels matters). These combinations were then normalised.

Texture features such as homogeneity and contrast are then calculated based on the GLCM in accordance with their corresponding equations.

This concept was later applied to oncological applications by the works of Robert Gilles and Phillipe Lambin (where the term ‘radiomics’ was introduced).

A

3.2 Radiomics pipeline

The development pipeline for a radiomics model begins with data preparation, which includes the region of interest segmentation (i.e. which area of the image is of particular interest).

Following feature extraction, the model can be developed using one of the machine learning methods introduced earlier and validated by testing with independent external data.

See the image below, which shows a typical radiomics pipeline.12

48
Q
  1. Radiomics

Radiomics is an advanced technique used to analyse medical images in minute detail, which provides detailed analysis of underlying characteristics of image data. This technique picks up on phenotypic characteristics that are generally indiscernible to the human eye but can be captured non-invasively on medical imaging. Radiomics forms a high throughput data space (where large datasets are processed very quickly) amenable to machine learning.

A

A note on mesoscopic significance in radiomics
Radiomics bridges the gap between macroscopic (radiological) and microscopic (histology) domains and are notable for their “mesoscopic significance”.

On the left are microscopic features of non-small-cell lung cancer on histology. On the right are their macroscopic features on radiology. Radiomics forms the middle area connecting the two domains.11

48
Q

3.3 Radiomics applications

There are numerous ways in which radiomics can be applied across both oncological and non-oncological practices.

A

Bone mineral density assessment (metabolic bone diseases).

Patient prognostication.
Virtual biopsy (tumour histology, genotype, and staging).
Treatment response, recurrence, and adverse events prediction.
Cancer biology advancements

49
Q

Patient prognostication

In this study(opens in a new tab), patients with non-small cell lung cancer were stratified into high and low risk three-year survival groups based on their radiomics signature.

https://pubmed.ncbi.nlm.nih.gov/36773776/

A

Virtual biopsy

Radiomics can also facilitate a non-invasive prediction of PD-L1 positivity in determining suitability of patients for receiving checkpoint inhibitor immunotherapy.14

50
Q

Clinical outcome prediction

Radiomics can be used to predict clinical outcomes such as the response to treatment or the risk of treatment-associated adverse events such as pneumonitis.15

A

Cancer biology

Cancer biology signatures can be used to advance our understanding of cancer biology.

In this example, gene set enrichment analysis identified the cellular pathways most correlated to the radiomics features in the predictive model.

Most are inflammation and hypoxia-associated, supporting their hypothetical roles in the body’s successful immune attack on the tumour cells.16

51
Q

Single Cell RNA analysis

Single Cell RNA analysis identified the myeloid and T cells associated with radiomic features relating to a high presence of the PD-L1 encoding gene, CD274.

This could help in better understanding the cellular interaction underlying the presence of this crucial gene and the body’s response to immunotherapy.17

A
52
Q

3.7 Radiomics in literature

There has been a rapid expansion in radiomics-related literature in recent years, as shown in the graph below.

This has been brought about by open-source toolkits such as Pyradiomics(opens in a new tab), public domain testing datasets such as The Cancer Imaging Archive (TCIA)(opens in a new tab), and mounting evidence supporting its use.

A

In this part of Section 4, you have learned about the various applications of AI in radiology and healthcare by looking at Generative Adversarial Networks (GANs), image classification including specific use cases, and radiomics.

Before moving on to the final quiz of this module, perhaps take a moment to reflect on the question below.

53
Q

3.5 Non-oncological applications

Radiomics can also see use in non-oncological areas.

In this study(opens in a new tab) using radiomic features of lumbar spine CT images to differentiate osteoporosis from normal bone density, a signature was developed based on features extracted from segmented vertebrae on CT, to predict for reduced bone mineral density.18

The lumbar vertebrae are first segmented on sagittal CT slices. Radiomics are extracted from these areas and used to construct three predictive models (using support vector machine, random forest and KNN) for predicting osteoporosis and osteopenia. The best performance was seen in the KNN-developed model, as shown in the bottom row of figures.

A

https://bmcmusculoskeletdisord.biomedcentral.com/articles/10.1186/s12891-022-05309-6

54
Q

mage adapated f: Ryu JY, Chung HY, Choi KY. Potential role of artificial intelligence in craniofacial surgery. Arch Craniofac Surg. 2021 Oct;22(5):223-231. doi: 10.7181/acfs.2021.00507(opens in a new tab). Epub 2021 Oct 20. PMID: 34732033; PMCID: PMC8568494.

Maklin, C. (2024). Fast Fourier Transform Explained | Built In. [online] builtin.com. Available at: https://builtin.com/articles/fast-fourier-transform(opens in a new tab).

Image: Li, X., Zhang, H., Yang, H. and Li, T.-Q. (2023). CS-MRI Reconstruction Using an Improved GAN with Dilated Residual Networks and Channel Attention Mechanism. Sensors, 23(18), pp.7685–7685. doi:https://doi.org/10.3390/s23187685(opens in a new tab).

Image: Le Dinh T, Lee S-H, Kwon S-G, Kwon K-R. COVID-19 Chest X-ray Classification and Severity Assessment Using Convolutional and Transformer Neural Networks. Applied Sciences. 2022; 12(10):4861. https://doi.org/10.3390/app12104861.(opens in a new tab)

Zhang, B., Qi, S., Monkam, P., Li, C., Yang, F., Yao, Y.-D. and Qian, W. (2019). Ensemble Learners of Multiple Deep CNNs for Pulmonary Nodules Classification Using CT Images. IEEE Access, 7, pp.110358–110371. doi:https://doi.org/10.1109/access.2019.2933670(opens in a new tab).

British Thoracic Society (n.d.). Pulmonary Nodules | British Thoracic Society | Better lung health for all. [online] www.brit-thoracic.org.uk. Available at: https://www.brit-thoracic.org.uk/quality-improvement/guidelines/pulmonary-nodules/(opens in a new tab).

NICE (2022). 2 The diagnostic tests | EarlyCDT Lung for assessing risk of lung cancer in solid lung nodules | Guidance | NICE. [online] www.nice.org.uk. Available at: https://www.nice.org.uk/guidance/dg46/chapter/2-The-diagnostic-tests#:~:text=2.7%20The%20Brock%20model%20is(opens in a new tab).

Osamah Alwalid (2019). MCA - Alberta stroke program early CT score (ASPECTS) illustration. Radiopaedia.org. doi:https://doi.org/10.53347/rid-72706(opens in a new tab).

Image: Song Q, Zhao L, Luo X, Dou X. Using Deep Learning for Classification of Lung Nodules on Computed Tomography Images. J Healthc Eng. 2017;2017:8314740. doi: 10.1155/2017/8314740(opens in a new tab). Epub 2017 Aug 9. PMID: 29065651; PMCID: PMC5569872.

Image: Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021 Feb;18(2):203-211. doi: 10.1038/s41592-020-01008-z(opens in a new tab). Epub 2020 Dec 7. PMID: 33288961.

Image: Chen M, Copley SJ, Viola P, Lu H, Aboagye EO. Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Semin Cancer Biol. 2023 Aug;93:97-113. doi: 10.1016/j.semcancer.2023.05.004(opens in a new tab). Epub 2023 May 19. PMID: 37211292.

Image: Chen M, Copley SJ, Viola P, Lu H, Aboagye EO. Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Semin Cancer Biol. 2023 Aug;93:97-113. doi: 10.1016/j.semcancer.2023.05.004(opens in a new tab). Epub 2023 May 19. PMID: 37211292.

Image: Chen M, Lu H, Copley SJ, Han Y, Logan A, Viola P, Cortellini A, Pinato DJ, Power D, Aboagye EO. A Novel Radiogenomics Biomarker for Predicting Treatment Response and Pneumotoxicity From Programmed Cell Death Protein or Ligand-1 Inhibition Immunotherapy in NSCLC. J Thorac Oncol. 2023 Jun;18(6):718-730. doi: 10.1016/j.jtho.2023.01.089(opens in a new tab). Epub 2023 Feb 10. PMID: 36773776.

Image: Chen M, Lu H, Copley SJ, Han Y, Logan A, Viola P, Cortellini A, Pinato DJ, Power D, Aboagye EO. A Novel Radiogenomics Biomarker for Predicting Treatment Response and Pneumotoxicity From Programmed Cell Death Protein or Ligand-1 Inhibition Immunotherapy in NSCLC. J Thorac Oncol. 2023 Jun;18(6):718-730. doi: 10.1016/j.jtho.2023.01.089(opens in a new tab). Epub 2023 Feb 10. PMID: 36773776.

Image: Chen M, Lu H, Copley SJ, Han Y, Logan A, Viola P, Cortellini A, Pinato DJ, Power D, Aboagye EO. A Novel Radiogenomics Biomarker for Predicting Treatment Response and Pneumotoxicity From Programmed Cell Death Protein or Ligand-1 Inhibition Immunotherapy in NSCLC. J Thorac Oncol. 2023 Jun;18(6):718-730. doi: 10.1016/j.jtho.2023.01.089(opens in a new tab). Epub 2023 Feb 10. PMID: 36773776.

Image: Chen M, Lu H, Copley SJ, Han Y, Logan A, Viola P, Cortellini A, Pinato DJ, Power D, Aboagye EO. A Novel Radiogenomics Biomarker for Predicting Treatment Response and Pneumotoxicity From Programmed Cell Death Protein or Ligand-1 Inhibition Immunotherapy in NSCLC. J Thorac Oncol. 2023 Jun;18(6):718-730. doi: 10.1016/j.jtho.2023.01.089(opens in a new tab). Epub 2023 Feb 10. PMID: 36773776.

Image: Chen M, Lu H, Copley SJ, Han Y, Logan A, Viola P, Cortellini A, Pinato DJ, Power D, Aboagye EO. A Novel Radiogenomics Biomarker for Predicting Treatment Response and Pneumotoxicity From Programmed Cell Death Protein or Ligand-1 Inhibition Immunotherapy in NSCLC. J Thorac Oncol. 2023 Jun;18(6):718-730. doi: 10.1016/j.jtho.2023.01.089(opens in a new tab). Epub 2023 Feb 10. PMID: 36773776.

Xue, Z., Huo, J., Sun, X., Sun, X., Song tao Ai, None LichiZhang and Liu, C. (2022). Using radiomic features of lumbar spine CT images to differentiate osteoporosis from normal bone density. BMC Musculoskeletal Disorders, 23(1). doi:https://doi.org/10.1186/s12891-022-05309-6(opens in a new tab).

Chen, M., Copley, S.J., Viola, P., Lu, H. and Aboagye, E.O. (2023). Radiomics and artificial intelligence for precision medicine in lung cancer treatment. Seminars in Cancer Biology, [online] 93, pp.97–113. doi:https://doi.org/10.1016/j.semcancer.2023.05.004(opens in a new tab).

PubMed (2014). (radiomics) AND ((‘2012’[Date - Publication] : ‘3000’[Date - Publication])) - Search Results - PubMed. [online] PubMed. Available at: https://pubmed.ncbi.nlm.nih.gov/?term=%28radiomics%29+AND+%28%28%222012%22%5BDate+-+Publication%5D+%3A+%223000%22%5BDate+-+Publication%5D%29%29&sort=(opens in a new tab) [Accessed 24 Sep. 2024].

A
55
Q

You have now reached the end of Section 4: Artificial Intelligence Applications in Radiology and Healthcare. The key learning points for this section are recapped below:

*
AI has many use cases across many different domains in radiology and healthcare from assistance with administrative tasks like appointment booking to clinical support with tasks like diagnostic inference.

*
Data analytics and mining are important in the context of AI use in radiology and healthcare because these techniques provide the meaningful data that AI models need in order to perform optimally.

*
Generative Adversarial Networks (GANs) consist of two neural networks that compete with one another. The generator generates new cases (fake cases). The discriminator differentiates between the fake cases and real-world cases. The discriminator and generator are then updated to get better at their tasks.

*
AI can be utilised in a variety of use cases in radiology including chest x-ray classification, stroke detection, and lung lesion detection.

*
Radiomics is an advanced technique used to analyse medical images in minute detail, which provides detailed analysis of underlying characteristics of image data. This technique picks up on phenotypic characteristics that are generally indiscernible to the human eye but can be captured non-invasively on medical imaging.

A