Bias Flashcards

1
Q

Bias can be the difference between an AI model that works well and one that doesn’t. In this section, you will learn all about bias, related terms, types of bias as well as case studies on the consequences of bias. You will then move on to explore mitigation strategies used to curtail bias as well as some examples of these strategies in real-world applications. Finally, you will learn about intentional bias and how to build a bias mitigation system.

Get started below by learning about what an AI algorithm should be.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Before learning more about bias, it’s important first to recap what an algorithm should be:

Valid and reliable: accurate to ground truth with low variance

Generalisable: reflect the target population

Fair: not exacerbate existing bias in the population.

These points are all undermined by bias.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Bias and related concepts

Bias is a tendency to deviate from the truth. Bias undermines our ability to make an unprejudiced consideration of a question.

A

Bias is a part of all scientific investigation, and bias mitigation is an important part of the scientific process. Bias mitigation attempts to address the fundamental question of ‘Are the results of the investigation true or could there be an alternative explanation?’.

The terms below are all fundamental concepts related to bias. The definitions below cover error and types of error, bias being one of them. Error and bias are often confused but should be seen as distinct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Error - Difference between model output and the truth

A

Bias - Systematic error favouring a particular outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Noise - Random error that may be irreducible

A

Variance - Oscillation from expected estimator caused by data sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Types of bias

The graphic below will reveal different types of bias specific to AI research and how they are defined.

Note that the text in red (which appears on the pop-ups) highlights the source of the biases, which is almost always data. The quality of the dataset is the most important concept.

A

Selection bias (also called sampling bias)
Collecting data that is not representative of the target population.

Exclusion bias
Deleting valuable data that was thought to be unimportant.

Measurement bias
A measurement process is biased if it systematically overstates or understates the true value of the measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Recall bias
Recall bias occurs when participants in a research study or clinical trial do not accurately remember a past event or experience or leave out details when reporting about them.

Algorithmic bias
Algorithmic bias describes systematic and repeatable errors in a computer system that creates incorrect outcomes. This bias may be used in the context of creating unfair outcomes to certain social groups.

A

Prejudice bias
Training data includes (human) biases containing implicit racial, gender, or ideological prejudices.

Observer bias (also called confirmation bias)
Favouring information that does not contradict the researcher’s desire or previous beliefs.

Survey Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. Case studies on the consequences of bias

As outlined above, bias most often occurs due to issues with the data the AI algorithm is based upon. Learn about two examples of bias by reading through the case studies detailed here.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Case study 1: racial bias
In the case study, ‘Dissecting racial bias in an algorithm used to manage the health of populations(opens in a new tab)’, researchers used an AI algorithm to estimate a number of health conditions in a population.

The representation of ethnic minorities in training and test data was different, resulting in a systematic underestimation of health measures (hypertension, anaemia, diabetes and chronic health conditions) in Black participants.1

Bias occurred in this study because the commercial algorithm used to guide health decisions deemed health costs to be a proxy for health needs. However, because less money was spent on Black patients with the same level of need as White patients, the algorithm falsely concluded that Black patients are healthier than equally sick White patients.

https://www.science.org/doi/10.1126/science.aax2342

A

Case study 2: skewed datasets

A team at Google developed an algorithm for the characterisation of dog species using an ImageNet database, made up of human-labelled images. The dataset used had a large number of dog images, which created a bias toward the algorithm recognising images that weren’t dogs as dogs.

When asked to reproduce an image of the famous Mona Lisa painting, the algorithm produced the image shown here, thus showing the consequence of non-representative training data on algorithm performance.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mitigation Strategies

Mitigation in data handling helps to reduce potential risks when working with data. A poor approach to data handling can lead to various issues with the performance of AI algorithms.

A

Step 1
Data collection
Issues can arise at this stage if you choose the wrong dataset for the AI algorithm, if you rely on just one dataset, or if there are issues (like low quality) with the dataset chosen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step 4
Feature engineering

If you remove certain features (an individual measurable characteristic, such as tumour size) without good analysis and reason for removing that feature, this will negatively impact the performance of the AI algorithm. Features most also be appropriately scaled so that AI algorithms aren’t skewed by certain data. You don’t want confounding features associated with large weights to skew the data. For example, you would want to scale the age of patients as well as the size of tumours so that the age of patients does not have too high a weight as compared to tumour size. Finally, it is paramount that missing data is handled appropriately. You can’t ignore missing data, so you need to decide whether to make estimates based on existing data, or whether to remove a variable containing missing data from analysis. This will depend on what data is missing, and how much.

A

Now that you have gained an understanding of improper data handling, take a look at some examples of this in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Step 2
Data investigation

If Exploratory Data Analysis (EDA) is insufficient, it means the data has not been properly analysed in order to understand it’s characteristics. If you conduct the EDA without a good level of understanding concerning the subject area, this could also have a negative impact on the AI algorithm as you may struggle to accurately interpret the data. Finally, missing anomalies and outliers in the data could also cause you to reach the wrong conclusions in your analysis of the data.

A

Step 3
Data splitting

Data leakage (where information captured from test data impacts the training process) can have a negative impact upon an algorithm as it skews performance estimates by using the same data more than once. Datasets that do not accurately represent real-world data are also problematic. In addition to this, adjusting certain ‘settings’ in the algorithm too much can also lead to overfitting, whereby the algorithm only performs well in a very specific set of circumstances. You can refer to Module 1, Section 2 for a reminder of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example 1: Feature selection

Feature selection is a common process used to reduce the number of input variables. Essentially, the features that are not deemed as important for the algorithm are removed and only the necessary features kept.

It is used to reduce the computational cost and improve the performance of the model.

See the graphic below which illustrates this process.1

A

Example 2: Feature selection

The example below shows how improper feature removal from imaging data may lead to bias. If the cropped chest radiograph is fed to a subsequent classifier (a type of AI model) for detecting consolidations, the consolidation that is located behind the heart will be missed (see the red arrow on image A).

This occurs because primary feature removal using the segmentation model was not valid and unnecessarily removed the portion of the lung located behind the heart. The COVID-19 algorithm was trained on data segmented for lungs only.

What is the problem? How could this have been avoided?

Take a moment to reflect on these questions before selecting the ‘reveal answer’ button below.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
  1. Mitigation in model training

Mitigation in model training helps to improve the accuracy of the data model. The information below highlights model training techniques used to optimise model training.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Step 1
Data sampling and augmentation

Data should be well balanced, meaning the data is not skewed toward one category over another. For instance, a dataset held on diseases might have far more data held on common diseases over rarer ones. Augmentation techniques may also be used (whereby, for example, using transformations like rotating or flipping images) in order to increase the size and diversity of a dataset.

A

Step 2
Model and loss function

Choosing the right AI model, as well as choosing one with the most appropriate architecture and structure, is paramount when it comes to its performance given specific circumstances. Dropout techniques are a type of regularisation technique (see the Bias and Variance lesson in Module 1 for a reminder) that prevent overfitting by randomly ignoring some neurons during training so that the model works well on a wide range of data. Loss functions measure how well a model performs when training and are useful in helping to gain an understanding of where the model can be improved. Regularisation, as previously covered in the Bias and Variance lesson in Module 1, prevents overfitting and helps models to perform well on new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Step 3
Optimisation

Optimisers are algorithms used to update the parameters (settings) of a model during training so that loss function is minimised. Hyperparameter optimisation is where certain settings in an algorithm are adjusted so that model performance optimally on the data used to assess the model performance (known as the validation set).

A

Step 4
Domain adaptation and ensembles

Domain adaptation is where a model trained on data from one domain is then adapted to be used on a different albeit related domain. For example, if you had trained a model on high-resolution chest x-rays, you may want to use domain adaptation so that the model would work well on lower resolution chest x-rays too.

Fine-tuning during transfer learning is taking a model built for a specific task and adapting it for another. Fine-tuning is one form of domain adaptation and is achieved by using a pre-trained model and training it further by using it on new, unseen data.

Model ensembles are a different strategy, and predictions are made from combining multiple, different models to improve performance.

15
Q

Example 1: Data augmentation

As previously referenced above under Step 1: Data sampling and augmentation, feature scaling is a technique used to augment a dataset. This technique helps to increase the size and diversity of a dataset and is particularly useful in small datasets.

Algorithms are trained using rotated, flipped, skewed, and cropped versions of the same image, which helps to reduce overfitting and making the algorithm more robust.3

A

Example 2: Feature re-scaling

The image below shows a cancer detection algorithm augmented with 50% rotated and 50% re-scaled lung tumours.4

Why did they do this?

Take a moment to reflect on these questions before selecting the reveal answer button below.

16
Q

References
Image: Erfanian S, Zhou Y, Razzaq A, Abbas A, Safeer AA, Li T. Predicting Bitcoin (BTC) Price in the Context of Economic Theories: A Machine Learning Approach. Entropy. 2022; 24(10):1487. https://doi.org/10.3390/e24101487(opens in a new tab)

Image: Erickson BJ. Mitigating Bias in Radiology Machine Learning: 1. Data Handling. Radiol Artif Intell. 2022 Aug. Available at: https://pubmed.ncbi.nlm.nih.gov/36204544/(opens in a new tab).

Image: Mutasa S, Sun S, Ha R. Understanding artificial intelligence-based radiology studies: What is overfitting? PMC (nih.gov). Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150901/(opens in a new tab). Accessed November 1, 2023.

Image: Mutasa S, Sun S, Ha R. Understanding artificial intelligence-based radiology studies: What is overfitting? PMC (nih.gov). Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150901/(opens in a new tab). Accessed November 1, 2023.

A
17
Q

Introducing Intentional Bias

A

Bias is sometimes introduced intentionally (through tuning, re-balancing, and augmenting) in order to enhance an AI algorithm’s performance.

Bias may be intentionally introduced in the following ways:

18
Q

Tuning to minimise false-negative cases (missed diagnoses) in breast screening can improve an algorithm’s sensitivity meaning that fewer cases of breast cancer are missed.

Tuning to minimise false-positive cases (unnecessary intervention) helps to reduce the number of incorrect diagnoses by refining the algorithm’s criteria for identifying true cases of cancer.

A

Dataset rebalanced (through oversampling) to focus on rare cases so that the algorithm learns to accurately spot these cases by increasing the algorithm’s exposure to rare cases in the dataset.

Dataset augmentation techniques (such as Generative Adversarial Networks) mitigate data scarcity by generating synthetic (fake) data, which is then used alongside the existing dataset. This enhances the algorithm’s ability to work on new, unseen data.

These techniques all help to create better performing AI algorithms. However, they require careful fine-tuning to ensure they don’t unintentionally introduce unforeseen biases.

19
Q

Building a Bias-Mitigation System

Whether building or buying AI, a bias mitigation strategy is paramount.

1 AI development must be informed by a focused use-case using expert knowledge to ensure the algorithm is reliable and clinically relevant.

A

2 Data ‘curation’ (collection, organisation, and management of data) must be transparently reported and follow industry standards to avoid bias in data handling. Where unavoidable or intentional, bias must be ethically or technically justified.

3 Similarly, algorithm architecture and training must be transparent and allow peer review. A ‘black box’ approach to AI development and deployment does not meet the ethical standard expected by patients in decisions about their health.

20
Q

Consider this:

Would a physician use a drug if the chemical composition and significant adverse events were redacted?

Would a surgeon use a device if the mechanical properties and failure rates were not published?

Would you use an AI tool if its development and bias were not made clear?

A

It is critical that you hold all the necessary information you need about an AI algorithm before using it. This necessary information includes an AI algorithm’s functionality, performance metrics, datasets used, and any potential biases.

Now that you have reached the end of the core content of this section on bias, try testing your knowledge in the quiz up next.

21
Q

You have now reached the end of Section 4: Bias. The key learning points for this section are recapped below:

An algorithm should be valid and reliable, generalisable, and fair.

Bias mitigation attempts to address the fundamental question of ‘Are the results of the investigation true, or could there be an alternative explanation?’

To approximate the truth, we must define a strategy to mitigate bias using data handling, model training and model evaluation.

Mitigation in data handling helps to reduce potential risks when working with data.

Feature selection is a common process used to reduce the number of input variables. It is used to reduce the computational cost and improve the performance of the model. However, care must be taken during this step to prevent bias.

A

Feature scaling is a technique used to augment a dataset, which involves transforming images. It is particularly useful in small datasets.

Bias may be intentionally introduced for numerous reasons. For example, tuning to minimise false-negative cases.

Building a bias-mitigation system involves a focused use case informed by expert knowledge, intentional data collection and manipulation, transparent algorithm and training design, and evaluation, ethics, and peer review.

22
Q
  1. Module summary

Congratulations on reaching the conclusion of Module 2: Building AI: Key Concepts! You have now gained a fundamental understanding of Artificial Intelligence in relation to radiology and healthcare.

Section 1: Building an AI Model: In the first section, you considered the fundamentals of AI model development. You explored an overview of the four main steps when it comes to building an AI model and learnt about some of the common pitfalls to avoid.

Section 2: Data and Data Governance: In the second section, you learned about the importance of data governance when it comes to the application of AI in radiology.

A

Section 3: Data Ethics and Legislation: You examined the principles, which underpin data governance including ethics, and the applications and permissions that must be gained in order to meet current requirements.

Section 4: Bias: In the final section, you learned about bias and strategies to mitigate it. You discovered more about the potential sources of bias and explored methods such as re-sampling.

23
Q

Suggested reading:

Erickson BJ. Mitigating Bias in Radiology Machine Learning: 1. Data Handling. Radiol. Artif Intell. 2022 Aug https://pubmed.ncbi.nlm.nih.gov/36204544/(opens in a new tab)

Zhang K et al. Mitigating Bias in Radiology Machine Learning: 2. Model Development. Radiol. Artif Intell. 2022 Aug https://pubmed.ncbi.nlm.nih.gov/36204532(opens in a new tab)

Erickson BJ et al. Magician’s Corner: 5. Generative Adversarial Networks. Radiol. Artif Intell. 2020 Mar. https://pubmed.ncbi.nlm.nih.gov/33937820/

A

Willemink MJ et al. Preparing Medical Imaging Data for Machine Learning. Radiology. 2020. https://pubmed.ncbi.nlm.nih.gov/32068507/(opens in a new tab)

Warnat-Herresthal S et al. Swarm learning for decentralized and confidential clinical machine learning(opens in a new tab). Nature. 2021;

Xu J et al. Federated learning for healthcare informatics(opens in a new tab). J Healthc Inform Res. 2020.

Prevedello LM et al. Challenges related to AI research in medical imaging and the important of image analysis competitions. Radiol Artif Intell 2019 Jan https://pubmed.ncbi.nlm.nih.gov/33937783/

24
Q

Case studies:

Flanders AE et al. Construction of a machine learning dataset through collaboration: the RSNA 2019 brain CT hemorrhage challenge(opens in a new tab). Radiol Artif Intell. 2020

Cheung ATM et al. Methods and Impact for Using Federated Learning to Collaborate on Clinical Research(opens in a new tab). Neurosurgery. 2023

A

Cushnan D et al. Towards nationally curated data archives for clinical radiology image analysis at scale: Learnings from national data collection in response to a pandemic(opens in a new tab). Digit Health. 2021 Nov

25
Q
A