Data Ethics and Legislation Flashcards
This section on data ethics and legislation follows on from the previous section on data and data governance, by looking specifically at data governance laws and data ethics principles that must be followed when dealing with data.
You will learn not only about specific data governance laws but also about applications and permissions, including who to speak to about what area, data pipelines and flow, and how to obtain ethical approval.
Healthcare data are classed as belonging to a “special category” and thus there are numerous conditions under which it is processed.
The main laws to be aware of when it comes to data protection are:
The Data Protection Act 2018 (UK General Data Protection Regulation).
The common law duty of confidentiality and consent.
The graphic below shows the relationships between the stages of development (including deployment) of a medical AI algorithm and the regulatory and legal frameworks governing this process.
Key information rights stipulate that personal data is processed “lawfully, fairly, and transparently”.
Several excellent resources, which detail the steps to be taken to achieve approval for different types of AI projects using patient data, have been included below.
Health Research Authority (HRA)
Information Commissioner’s Office (ICO)(opens in a new tab)
Transformation Directorate
MHRA
NHS AI Digital Regulations Service
The latest Ionising Radiation (Medical Exposure) Regulations (IR(ME)R)(opens in a new tab) guidance should also be reviewed when undertaking any imaging work using AI. As noted in the recent IR(ME)R update, “Under IR(ME)R, a person must be involved in the clinical evaluation process for each exposure and AI cannot be used alone to perform image interpretation.”
Applications and Permissions
The questions highlighted in this module so far (see Section 2) are the key questions you will need in order to decide if a data ethics application is required and to then, where applicable, provide detail for in any data ethics application.
It is always advised to contact your local research ethics team to determine what permissions are needed for each study. Key people to involve are your:
Data Protection Officer (DPO)
Caldicott Guardian
Information Governance lead.
Even if a formal data ethics application is not required, a Data Protection Impact Assessment (DPIA) should be completed and regularly reviewed. For more information, see the resources below:
NHS England’s Artificial Intelligence guidance(opens in a new tab)
When do we need to do a DPIA? | ICO (opens in a new tab)
AI and data protection risk toolkit | ICO(opens in a new tab)
Data Protection Impact Assessments - Health Research Authority(opens in a new tab)
Data Protection Impact Assessments Template (opens as a Word document)
Data pipelines / flow diagrams(opens in a new tab) and data management plans (see: DMPonline(opens in a new tab) and MRC data management plan template – UKRI(opens in a new tab)) are another key step alongside the DPIA, which help:
Illustrate where the data will be stored
Outline who will have access to the data at each step of the project.
This image is a data flow diagram from Halling-Brown et al(opens in a new tab), which shows the flow of data to build the OPTIMAM database, which is a large-scale mammography database in the UK.1
Obtaining Ethical Approval
If ethical approval for research is required outside of your local research ethics committee, the HRA Research Ethics Committee (REC) is the body in England from which ethical approval is obtained. However, sometimes ethical approval is also required from other bodies, for example the Public Health England Research Advisory Committee in the case of screening trials.
Additionally, as AI research often requires a vast number of patients, and because retrospective datasets are also used, it may not be feasible to collect direct written consent from all patients.
Furthermore, research may be taking place outside the direct care of the patient. Therefore, Section 251 approval is needed from the HRA Confidentiality Advisory Group (CAG). The NHS England National data opt-out service should also be considered.
Applications to the HRA are made through the IRAS system.
The Caldicott Principles
The eight Caldicott principles provide a framework to be addressed in any data ethics application.
A key principle in data ethics to keep in mind is ‘proportionality’, which is about minimisation: only use the information that is necessary.
Navigate through the carousel below for an overview of the Caldicott Principles.
The National Data Guardian’s document outlining the eight principles is also available below.
Principle 1: Justify the purpose(s) for using confidential information
Every proposed use or transfer of confidential information should be clearly defined, scrutinised and documented, with continuing uses regularly reviewed by an appropriate guardian.
Principle 2: Use confidential information only when it is necessary
Confidential information should not be included unless it is necessary for the specified purpose(s) for which the information is used or accessed. The need to identify individuals should be considered at each stage of satisfying the purpose(s) and alternatives used where possible.
Principle 3: Use the minimum necessary confidential information
Where use of confidential information is considered to be necessary, each item of information must be justified so that only the minimum amount of confidential information is included as necessary for a given function.
Principle 4: Access to confidential information should be on a strict need-to-know basis
Only those who need access to confidential information should have access to it, and then only to the items that they need to see. This may mean introducing access controls or splitting information flows where one flow is used for several purposes.
Principle 5: Everyone with access to confidential information should be aware of their responsibilities
Action should be taken to ensure that all those handling confidential information understand their responsibilities and obligations to respect the confidentiality of patient and service users.
Principle 6: Comply with the law
Every use of confidential information must be lawful. All those handling confidential information are responsible for ensuring that their use of and access to that information complies with legal requirements set out in statute and under the common law.
Principle 7: The duty to share information for individual care is as important as the duty to protect patient confidentiality
Health and social care professionals should have the confidence to share confidential information in the best interests of patients and service users within the framework set out by these principles. They should be supported by the policies of their employers, regulators and professional bodies.
Principle 8: Inform patients and service users about how their confidential information is used
A range of steps should be taken to ensure no surprises for patients and service users, so they can have clear expectations about how and why their confidential information is used, and what choices they have about this. These steps will vary depending on the use: as a minimum, this should include providing accessible, relevant and appropriate information - in some cases, greater engagement will be required.
https://transform.england.nhs.uk/media/documents/Template_Data_Sharing_Agreement_-12April21_-_FINAL.odt
ther key considerations regarding data ethics include:
Data sharing agreements (DSA)(opens in a new tab): for commercial and academic collaborations (opens as a Word document).
https://transform.england.nhs.uk/media/documents/Template_Data_Sharing_Agreement_-12April21_-_FINAL.odt
A database access committee (DAC) is a group containing information governance staff, patients, clinicians, and researchers who oversee the use of an institution’s datasets. (See: Public governance of medical artificial intelligence research in the UK: an integrated multi-scale model(opens in a new tab)).
https://researchinvolvement.biomedcentral.com/articles/10.1186/s40900-022-00357-7
Data security: as part of ethics applications, a Data Security and Protection Toolkit(opens in a new tab) check is made.
https://www.dsptoolkit.nhs.uk/
Trusted Research Environment (TRE) / Secure Data Environments (SDEs): secure areas (‘Data Safe Havens’) for researchers to access data. (See: Trusted Research Environment service for England - NHS Digital(opens in a new tab) and Machine learning models, trusted research environments and UK health data: ensuring a safe and beneficial future for AI development in healthcare(opens in a new tab)).
https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england
https://jme.bmj.com/content/medethics/early/2023/03/30/jme-2022-108696.full.pdf
Patient and public involvement (PPI)
https://www.nihr.ac.uk/documents/ppi-patient-and-public-involvement-resources-for-applicants-to-nihr-research-programmes/23437
: a key stage of any data study to ensure the methods being used are acceptable to the patients and public as well as to help make research material (e.g. posters and information sheets) accessible to the public. Understanding patient data (UPD)(opens in a new tab) is a great resource for this type of work.
https://understandingpatientdata.org.uk/
FAIR principles(opens in a new tab): findability, accessibility, interoperability, and reusability.
https://www.nature.com/articles/s41597-023-02298-6#:~:text=A%20foundational%20set%20of%20findable,the%20reusability%20of%20scholarly%20data
Research PACS(opens in a new tab) can provide a mirror image of PACS and a safe environment in which to deploy and test AI algorithms.
https://pubs.rsna.org/doi/abs/10.1148/rg.327115138?journalCode=radiographics
The Five Safes Framework(opens in a new tab): safe data, safe projects, safe people, safe settings, safe outputs.
https://ukdataservice.ac.uk/help/secure-lab/what-is-the-five-safes-framework/
Task 1 : Imagine you are setting up a study to evaluate an AI for chest x-ray pneumothorax detection at your local site. Your task is to answer the five questions (see below), which were covered earlier in this module.
Why?
Why is each part of the data collected needed (e.g. personal identifiers, types of cancer, machine vendor) and is each part necessary?
Who?
Who will have access to the data in an identifiable and de-identified form? For example, medical physicists, PhD students, companies.
Who are the patients that will be included in this dataset (inclusion / exclusion criteria) and how many people will be included?
Where?
Where will the data be stored at each step? For example, on an NHS site / University site / Trusted Research Environment (TRE) / Research PACS / Cloud based storage. Is the data being processing as part of routine care? For example, within NHS firewall.
Is data going outside the UK or outside the EU? For example, is data being stored in the cloud and where is the cloud based? Will the data be shared with third parties? For example, commercial companies / academics.
Where is the backup of the data located? Is there a backup?
Where will the systems (AI algorithm) be that need to access the data, and can the system access the data files?
What?
What is the data going to be used for? For example, training or testing or both.
What is the task the AI algorithm is carrying out?
What level of anonymisation / de-identification will be used? For example, anonymisation, pseudonymisation or synthetic data (which parts of the data need de-identifying e.g. dates, addresses, DOB).
What form of consent for data will be used? For example, opt in or opt out.
What data is being accessed and on what systems? For example, picture archiving and communication system (PACS), Electronic health record (EHR).
What route will be used to move the data if it needs to be moved off site?
What format does the data need to be in? For example, comma separated value (CSV), Digital image and communications in medicine (DICOM).
When?
When is the time frame the data is from? For example, 01/01/2010-31/12/20.
When will it be stored until? For example, 5 years.
Task 1: Model answer
Who and when?
Local radiology department doctors (2x trainees and 1x consultant) will have access to both identifiable and non-identifiable patient information.
The dataset will include 1000 patients who attended A+E majors as part of a trauma call and had a chest x-ray. Patients will be aged > 18 and data will be collected between 01/01/2022 and 01/01/2024. The data will be stored for 5 years.
What and why?
The patient’s chest x-ray and radiology report will extracted from PACS and will be de-identified by:
Removing any irrelevant information
Changing the dates to the first of the year
Replacing the patient’s hospital number with a study ID whilst the data remains within the hospital system
Taking the national data opt-out consent into account.
The data will be used to test an AI model. The images will be in a DICOM format, and the reports will be extracted to Excel. Only the chest x-ray and radiology report will be collected and only the necessary study identifiers will remain to ensure only the necessary data needed for this study is accessed.
Where?
Data will initially be stored in internal NHS systems. Once de-identified it will be transferred to a secure university server within the UK. The university system is also backed up at the end of each day to ensure that this data is backed up. The AI model will be able to access the data from this university server.