Building an AI model Flashcards
Building an AI model requires a solid understanding of key terms as well as the differences between AI models, learning and algorithms. After you have learned about these key definitions, you will move on to learn about the steps of building an AI model: data collection, model selection, training, test and validating, and finally, model implementation. In the final part of this section, you will get to grips with the common pitfalls when it comes to building an AI algorithm.
Data-driven algorithm
An Al model is a computational
structure that learns patterns from data.
Instead of being explicitly programmed
to perform a task, it uses training data to
learn how to perform that task.
Generalisability
Once trained on a subset of data, the
model can make predictions or
decisions about new, previously unseen
data. The goal is for the model to
generalise well, meaning its predictions
are accurate even on data it hasn’t seen
before.
Diverse applications
Al models can be applied to a wide
range of tasks, from image and speech
recognition to language translation,
game playing, and medical diagnosis.
The architecture and training data will
vary based on the specific application.
- Learning versus algorithms
You’ll hear a lot about ‘learning’ and ‘algorithms’ when it comes to Artificial Intelligence, but these terms are not interchangeable.
Try out the sorting activity below before the true definitions are revealed to you.
Algorithms are the mathematical model mapping methods used to learn or uncover underlying patterns embedded in the data. You can think of algorithms as the instructions given that guide the learning.
Learning is the process of a computer or system improving its performance on a task through experience.
The following information will equip you with an overview of the steps to building an AI model, from data collection to model selection, training, testing, and validation, to finally implementing an AI model.
Step 1: Data Collection.
Representative data is crucial for training, validating and testing.
And robust independent datasets are needed for testing.
Step 2: Model Selection.
The AI algorithm is selected. In medical imaging, the most common algorithm chosen is often a deep learning algorithm like a Convolutional Neural Network.
The AI algorithm is selected. In medical imaging, the most common algorithm chosen is often a deep learning algorithm like a Convolutional Neural Network
Step 3: Train, Validate, Test.
The steps of train, validate, test are typically used in deep learning model development. Iterative processes are used to adjust the algorithm’s weights.
It is best if robust independent evaluation is performed on external data.
Step 4: Model Implementation.
The AI algorithm is implemented into clinical practice.
Building an AI model is a complex process.
However, with proper data management and robust validation, it can lead to improved healthcare outcomes.
Learn more in the remainder of this lesson.
Step 1: Data collection
Data collection is a crucial step in building an AI model as it underpins the whole process.
In deep learning model development, vast amounts of data are needed for each part of the process of training, validating, and testing, and numerous datasets may be needed to carry out these tasks.
It is important that data is representative of the study question and that varied datasets are provided to mitigate bias. For testing, ideally robust independent external datasets should be used. More detail about data and accessing data will be detailed in Section 2 of this module.
A robust ground truth (reference standard) is needed to validate the AI performance against; this again requires data. Further details about ground truth are provided in Section 2 of Module 1.
Step 2: Model selection
The AI algorithm can be hosted in various coding languages (like Python, JavaScript and C++). In medical imaging, the model is most often a deep learning algorithm such as a convolutional neural network (CNN).
See the graph below for an overview of AI/ML-Enabled Medical Devices (by Date of Final Decision) up to end of 2023
Step 3: Train, validate, test
The third step of building an AI model involves training, validating, and testing the model.
Train
The AI algorithm is shown data and the result for each case. Through iterative processes, the algorithm learns and adjusts its weights to be able to classify data. This often requires a vast amount of data
Validate
Initial testing takes place to confirm the model has learnt how to classify the data correctly using a small unseen proportion of the dataset. When comparing multiple trained algorithms, the validation data is used to select the best one. It helps to fine-tune the model’s “hyperparameters” so the final model is optimal.
Test
Testing is the step where the model is tested on an unseen dataset. Often this is data from a different external institution, which is used to perform robust independent evaluation of the algorithm that was developed on the training dataset.
Step 4: Model implementation
Once the AI algorithm has been evaluated and has all the necessary approval requirements in place, it may be implemented into clinical practice. Each hospital site, trust or programme may have different requirements to reach prior to implementation. The NHS Buyer’s guide(opens in a new tab) provides a framework to start asking the right questions prior to adoption.
Implementing AI has its own challenges, which will be explored in future modules of this series. These include technical expertise, post-mark surveillance (the ongoing evaluation of an AI model after implementation) as well as ethical and legal responsibilities.
You have now gained an understanding of the steps of building an AI model. Next up, you will learn about some of the common obstacles that you may find when building an AI model.
https://transform.england.nhs.uk/ai-lab/explore-all-resources/adopt-ai/a-buyers-guide-to-ai-in-health-and-care/
Center for Devices and Radiological Health. Artificial Intelligence and machine learning (AI/ml)-enabled medical D, U.S. Food and Drug Administration. Available at: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices(opens in a new tab) [Accessed: 25 October 2024].
Common Pitfalls when Developing AI Algorithms
Handling of missing data
Data may be missing at random (where there is no particular factor, which has led to this missing data).
Data may be missing due to a particular external factor (for example, survey participants skipping a survey question).
Data leakage
Data from the same patient “leaking” across training, validation, and test data. This can happen when there is overlap (e.g. of images or patients) used in each stage, meaning the model memorises rather than analyses the data. See ‘3.1 Training’ below.1 Data leakage could occur in this scenario if images from the same patient are present in both the training data, and the validation data.
Handling imbalanced data
Imbalanced data is where categories in the dataset are skewed toward a particular category.
For example, if you are measuring an algorithm’s accuracy when determining whether a mass is benign versus malignant, the accuracy will most likely be negatively impacted if the number of benign cases in the dataset is 95%. There are, however, data science approaches to mitigate balance
You have now reached the end of Section 1: Building an AI Model. The key learning points for this section are recapped below:
An AI model is a computational structure that learns patterns from data, which should be capable of being applied to a diverse range of applications. Once trained on a subset of data, the model can make predictions or decisions about new, previously unseen data.
Learning is the process of a computer or system improving its performance on a task through experience.
Algorithms are the mathematical model mapping methods used to learn or uncover underlying patterns embedded in the data. You can think of algorithms as the instructions given that guide the learning.
The steps to building an AI algorithm in medical imaging typically are: data collection, model selection, training, validation, and testing. Model implementation follows these steps.
Common pitfalls when building AI algorithms include the handling of missing and imbalanced data, and data leakage.