Mutation Detection Flashcards
How many rare diseases have been described, and how many people do they affect in the UK?
Approximately 7,000 rare diseases have been described, affecting about 1 in 17 of the UK population (approximately 3.5 million individuals
How many rare diseases are caused by highly penetrant single nucleotide variants (SNV), small indels, or CNVs?
5000
Relies on the identification of a disease causing germline variant
What is the primary challenge in interpreting whole exome and whole genome sequencing for inherited diseases?
Whittling down a list of candidate variants to identify the disease-causing one(s).
What guidelines are used for germline variant interpretation?
In 2015, the American College of Medical Genetics (ACMG) published guidelines to a series of criteria in Mendelian disorders.
In 2016, adopted by the Association for Clinical Genomic Science (ACGS): ACGS Best Practice Guidelines for Variant Classification in Rare Disease
What evidence is used to classify germline variants using ACGS?
Collate evidence from population data, computational data, functional data, segregation data, literature evidence, de novo data
Many software packages to aid interpretation (for example, Alamut, Congenica
What classification system does ACGS use?
ACMG/ACGS guidelines use a five-class system:
5: Pathogenic >99% probability of a variant being disease-causing
4: Likely pathogenic >90%
3: Uncertain significance
2: Likely benign <10%
1: Benign <0.1%
Each evidence relates to a criterion and is worth a specific number of points. evidence points for pathogenicity (Very Strong= 8, Strong= 4, Moderate= 2, Supporting= 1) or benignity (Strong= -4, Moderate= -2, Supporting= -1). Evidence point thresholds for the 5 classes are: ≥10 (Pathogenic), 6-9 (Likely Pathogenic), -1 to -5 (Likely Benign), ≤-6 (Benign)
What other germline interpretation guidelines are used?
Cancer specific: CanVig- adapts ACGS to be suitable for cancer predisposition genes
Disease specific e.g. BRCA, Lynch
What somatic variant interpretation guidelines are used?
Association for Molecular Pathology (AMP) Guidelines (Li et al 2017)
S-VIG also been developed but often not fully used in routine practice
What are the AMP guidelines?
4 tier system based on clinical actionability
Uses:
Population data e.g gnomAD
Functional data
Predictive data
Cancer hotspots e.g. cancer databases COSMIC, mycancergenome etc
Drug approval guidelines: NICE, CDF, NCCN
How does AMP guidelines classify variants?
Tier I, variants of strong clinical significance - made up of variants with level A & B evidence (clinically actionable)
Tier II, variants of potential clinical significance- made up of variants with level C &D evidence (clinically actionable)
Tier III, variants of unknown significance- evidence may be conflicting or absent- (VUS)
Tier IV, benign or likely benign variants- there is evidence a variant does not have any actionability
What is meant by a driver mutation?
Confer growth advantage on the cells carrying them and have been positively selected during the evolution of the cancer. Non-recurrent variants are unlikely to be drivers otherwise they would more than likely been seen previously
What is different about SVIG and AMP guidelines?
SVIG aims to help standardise somatic VI and bring in line with germline ACGS guidelines
Uses evidence based points system to help determine oncogenicity of result
Has a list of known hotspot variants that are oncogenic
Can only be used on SNVs, not suitable for structural or copy number variants
What variant nomenclature is used?
Human genome variation society (HGVS)
- Clinical reports should include sequence reference) to ensure unambiguous naming of the variant at the DNA level as well as provide coding and protein nomenclature
(e.g., “c.” for coding DNA sequence, “p.” for protein,
What are external bioinformatic databases?
External Bioinformatic Databases (DBs) can be described as databases which store biological data information. The data included in the databases/resources can be split into the main following areas:
- genome and sequence data (sequence alignment, variant databases, phylogenetic and splicing predictions)
- transcriptomics data (e.g. full length cDNAs or mRNAs),
- proteomics data (e.g. protein databases, protein structure, family and domain classification)
- other specialised databases (e.g. cancer and methylation databases).
What are primary and secondary databases?
Primary databases consist of experimentally derived data (e.g. nucleotide and protein sequences).
Secondary databases consist of data produced from the analysis of primary data. Secondary databases often include data from a combination of other databases (both primary and secondary databases) and other (e.g. literature).
Give some examples of genomic databases?
Primary
- EMBL (European Bioinformatics institute) (Europe)
- GenBank (National Centre for Biotechnology Information) (USA)
Secondary
OMIM (Online Mendelian Inheritance in Man)
RefSeq
Decipher
ClinVar
The cancer Genome Atlas (TCGA)
Cosmic
ClinGen
What is germline conversion rate?
Number of pathogenic variants of true germline origin× 100/total number of tumour-detected pathogenic variants
What are Cancer Susceptibility Genes?
A term used to describe a gene that may increase a person’s risk of developing some types of cancer if it has certain mutations.
When should variants identified in tumour be investigated for germline significance?
Increased tumour testing increase in detection of secondary/incidental findings – some of which are germline in origin but not feasible to carry out paired tumour/germline on all samples to confirm
ESMO Guidelines for Germline-focussed analysis of tumour-only sequencing
What genes should always be followed up for germline analysis after tumour testing?
ESMO guidelines identified 7 most actioonable genes which hava a high germline conversion factor: BRCA1/BRCA2/MLH1/MSH2/MSH6/PALB2/RET
What do the ESMO guidelines suggest about germline testing of tumour variants?
Four potential strategies for clinical labs. Intermediate/conservative suggested for UK/Europe
Permissive: germline follow-up for all 40 genes in all tumour types.
Intermediate-permissive: germline follow-up for all 23 MA-CSGs/HA-CSGs in all tumour types but germline follow-up only in ‘associated’ tumour types for 17 SA-CSGs.
Intermediate-conservative: germline follow-up in all tumour types for the 7 most actionable (MA-CSGs) but germline follow-up only in ‘associated’ tumour types for the other 33 HA-CSGs/SA-CSGs (highly actionable/standard actionable)
Conservative: germline follow-up only in ‘associated’ tumour types for all 40 genes.
What is an incidental finding?
defined incidental findings as “results that are not related to the indication for ordering the sequencing but that may nonetheless be of medical value or utility
Now referred to as secondary findings as incidental gives a sense of insignificance
Off target finding
What guidelines are used to determine when secondary/incidental findings be reported?
The ACMG published a minimum list of 59 genes to be reported as incidental or secondary findings
The ACMG subsequently established the Secondary Findings Maintenance Working Group to develop a process for curating and updating the list over time
What is required for individuals undergoing clinical genomic sequencing regarding secondary findings?
Informed consent and an option to opt-out of receiving secondary findings.
On what basis were genes included in the ACMG secondary findings list?
Based on clinical features, likelihood of early diagnosis, molecular characteristics, clinical testing options, and medical actionability.
When should secondary findings be reported according to ACGM guidelines?
For clinically significant findings that have potential health implications, these include:
- the timing of the impact upon health, when this will come about, now or in the future
- its scope, who it affects, the individual, their offspring or other family members
- its scale, whether its impact upon health is significant or trivial
- The probability of impact, whether the variant is completely penetrant or only marginally so.
Why do some patients receive genetic testing through research?
The boundary between clinical and research activities is becoming increasingly blurred, particularly in the subspecialty of clinical genetics and is growing with the use of genomic technology.
This arises because in a resource scarce environment, some genetic tests (e.g for rare diseases) can only be accessed through research protocols because there are no relevant genetic tests validated for clinical use within the NHS
When should variants identified in research be reported?
Beneficence - it has the potential for reducing the risk of, or preventing, disease
Non-maleficence - to do no harm
Justice
respect for autonomy
When might secondary findings from WGS be repoted?
Discussed at GTAB:
- The seriousness of the presenting problem and the nature of the other findings
- Whether the finding represents a known clinical entity or risk factor or a finding that requires further investigation (e.g. variants of uncertain clinical significance, VUCS)
- Whether the finding has been validated to an acceptable standard
- The availability of any treatment/prophylaxis, and its likely success
- Whether the finding is a risk factor for disease or represents a disease process
- The age of the patient and co-existing morbidities and conditions
- Prior knowledge of the patient’s wishes
What does the term “opportunistic screening” refer to in the context of secondary findings?
The intentional search for additional pathogenic variants during genomic sequencing.
What is a bioinformatics pipeline?
A pipeline is a term in computer science for chaining or connecting software tools/programs/scripts creating a stepwise workflow to execute the analytical steps necessary for a complete bioinformatic analysis. Each step in the pipeline is typically designed to take input from the previous step and generate output that is used as input for the next step.
What are FASTQ
Contains raw sequencing
What are FASTQ files?
Text file containing demultiplexed sequence reads
Often used as the first input for bioinformatics pipelines to generate quality and alignment metrics
What are BAM files?
Generated from FASTQs.
Reads are mapped against the reference genomes to allow for aligned reads to be viewed. Also provides some quality information e.g. depth of sequencing at different locations
What are CRAM files?
Compressed BAM files to help with space saving
What is VCF file?
File contains information about a position in the genome, usually variants. May also include annotations
This is the output of variant calling section of BI pipeline
What is a BED file?
Stores genomic regoins and is usually used to define the regions fo interest for an assay e.g. variants will only be called within the regions defined by the BED file
What components are normally included in a bioinformatics pipeline?
The input- FASTQ
Quality control- filter out poor quality reads and removing artefacts
Sequence alignment- map to reference genome and produce BAMs
Variant calling- Identify variations between sequence and reference
Variant filtering: filter out false positives or poor quality variants (e.g. due to poor mapping, strand bias)
Variant annotation: characterize variant with location, HGVS, VAF, databases
Variant prioritisation: some pipelines may only show variants that are thought to be clinically relevant, filter out polymorphisms- must be fully validated or could miss important variant
What are the advantages of using an in house BI pipeline?
Full control
- if commercial, an organisation must put their trust into the claims made by a commercial operator
- allows modification of filtering criteria, annotations and databases
Data security
- all in house so less cocersn than with a commercial operator
Cost
- often cheaper than regularly paying for commercial BI
What are the disadvantages of using an in house BI pipeline?
Cost
-Can be costly to set up at the beginning
Expertise
- Requires a high amount of expertise and validation to ensure pipelines are effective and that variants of interest are not filtered out
Time
- likely to take a considerable amount of time to work through bugs and test quality of outputs.
Give some examples of cases that should be used when validating a pipeline?
Low VAF- close to LOD
Low TC samples
Identify areas of poor quality/drop out
Large deletions/insertions
Horizontally complex variant: 2 or more sequence alterations on same read in close proximity, so that they may resemble single variant.
Vertically complex variant: three or more alleles are represented by different sequence reads
What are the common causes of poor sequencing data?
GC rich regions
Homopolymeric tracts
Strand bias
Artefacts due to poor quality input DNA (e.g. deamination from FFPE)
Insufficient template DNA
Why might sequencing reads not always align correctly to the reference genome?
Variant present which affects alignment
Read maps to multiple locations in the reference genome (e.g. pseudogene)
Reference genome is incomplete so sequence is missing (e.g. centromeric regions)
Errors introduced during sequencing
Why might a sequencing read map to more than one location on the reference genome?
segmental duplications or pseudogenes can result in the same sequence being present in 2 or more locations in the genome.
Why do pseudogenes make NGS difficult?
NGS sequence reads that map to these duplicated regions e.g. psuedogenes will not have unique mapping and therefore may be removed from downstream analyses.
If clinically relevant genes have a pseudogene it may be difficult to get sufficient coverage of the gene for variant calling
Alternatively, called variants may be in the pseudogene and not the gene itself
What gene is difficult to analyse due to pseudogene?
PMS2 and Lynch Syndrome
What is paired end sequencing and why is this better than single end?
Paired-end sequencing- sequence both ends of the DNA fragment producing forward and reverse reads
Paired-end sequencing can be useful for detecting structural variants (deletions, insertions or inversions)
What is FISH, and what are its advantages?
FISH (Fluorescence in situ hybridization) is a technique using DNA probes with fluorophore-coupled nucleotides to detect complementary sequences in fixed cells or tissues.
Its advantages include high sensitivity, direct application to both metaphase chromosomes and interphase nuclei, and single-cell level visualization
Describe the basic principles of FISH and its technique
FISH involves denaturation of probe. Treated with formadimide which reduces the melting point of the DNA and helps in faster hybridization, where the probe binds to complementary sequences on the slide.
The procedure includes slide preparation, codenaturation, hybridization, stringent washing, counterstaining, and visualization using a fluorescent microscope
What are the three main types of FISH probes, and what are their applications?
The three main types of FISH probes are locus-specific probes, alphoid or centromeric repeat probes, and whole chromosome probes. They are used for detecting specific genetic regions, determining chromosome number, and examining complex chromosomal abnormalities, respectively.
What are the main clinical applications of FISH?
FISH is widely used in oncology for diagnosing and prognosticating various cancers such as sarcoma, CML, CLL, ALL, AML, NHL, and MCL. It helps in defining treatment options, disease monitoring, and treatment response assessment.
What are the advantages of FISH?
The advantages of FISH include high resolution, applicability to various cell types (including non-living, fixed, and paraffin-embedded cells), relatively fast results (usually within 24 hours), and the ability to count many cells.
What are the disadvantages of FISH?
The disadvantages of FISH include its limited scope (it will not detect abnormalities in regions not probed), lack of high throughput, and lower resolution compared to PCR techniques.
What are locus-specific probes, and what are their applications?
Locus-specific probes bind to particular regions of a chromosome and are used to determine chromosome location or copy number of specific genes. They can be breakapart probes for detecting translocations, dual-color dual-fusion probes for known translocations, or enumeration probes for detecting deletions, duplications, and chromosome ploidy.
What techniques can be used to detect copy number changes?
G banding
FISH
QF-PCR
RT-qPCR
MLPA
SNP arrays
NGS
What are the disadvantages of G-banding?
Low resolution (>5Mb), labor-intensive, slow turnaround time, unable to detect UPD, requires dividing cells and manipulation of the cell cycle, risk of cultural artefacts, and some abnormalities not detected in cultured cells.
What are the key steps in the principle of MLPA?
Hybridisation of probes to DNA, ligation of probes, PCR amplification of ligated probes, separation by capillary electrophoresis, and analysis and quantification.
What is the purpose of the stuffer sequence in MLPA probes?
It allows for the generation of different sized products for electrophoretic resolution.
List the basic procedure steps for carrying out MLPA.
- Denaturation of genomic DNA
- Hybridisation of probes to the sample
- Ligation of probes
- PCR amplification of ligated probes using universal primers
- Separation of amplified products by capillary electrophoresis
- Analysis and quantification
What is MS-MLPA?
Allows copy number detection and methylation profiling
How does MS-MLPA detect methylation?
By using probes with a methylation-sensitive restriction site and restriction enzume Hha1; methylated DNA remains undigested and generates a signal during PCR.
How does RT-MLPA differ from standard MLPA in its procedure?
It requires reverse transcriptase to create cDNA from RNA before proceeding with the MLPA reaction.
What is RT-MLPA used for?
mRNA expression profiling.
What are the four main categories of copy number detection methods in NGS?
Split Read (SR)
Read Pair (RP)
Assembly-based (AS)
Read Depth (RD)
Why is the combined analysis (CA) method popular in NGS?
Because no single method is comprehensive enough to detect the full range of DNA variations on its own.
What are some limitations of the Split Read (SR) method in NGS?
Limited by the length of reads and less reliable in regions with duplications.
What is a primary advantage of Read Depth (RD) in NGS?
It is reliable for detecting deletions and duplications and can count the number of CNVs, though it has poor breakpoint resolution.
What is a SNP?
Single nucleotide polymorphism (SNP) - a DNA sequence variation occurring commonly within a population (e.g. >1%) in which there is a single nucleotide change e.g. C to T.
On average, SNPs occur every 1000bp meaning that there are roughly 4 to 5 million SNPs per person’s genome.
What are the types of SNPs?
Synonymous SNPs (sSNP) do not cause a change in amino acid.
Non-synonymous (nsSNP) when an amino acid is altered and are nonsense or missense
Outline the principle of SNP arrays
Patients DNA is fragmented and amplified
Bead cheap containing thousands of probes containing common SNPs
Patients DNA hybridized to probe and single base extension with dNTPs. These are fluorescently labelled.
Beads are scanned and the detection system interprets hybridization signal and coverts intensity into genotype
DNA binds if the patient contains the SNPs. This leads to the some heterozygous calls
What are the two main types of SNP arrays?
Illumina
- 850K SNPs
- 3262 target regions
- Uses single base extension with a fluorescently labelled dNTP (green if one base and red if another)
Affymetrix array
- 1.8 million markers. 946,000 for CNV detection, 906,600 SNPs
- All targets bind all probes and the presence of SNPs reduce the affinity of a signal and weaker fluorescence
What is the B allele frequency?
By chance, people will be homozygous for some SNPs and heterozygous for other SNPs. The SNPs selected in the bead-chip type arrays are carefully chosen for variability within a population.
In the B-Allele Chart, BB homozygotes have a data value of 1.0, AA homozygotes have a data value of 0.0 and AB heterozygotes have a data value of 0.5. This results in three clusters on the BAF plot
How can the BAF plot show a copy number gain or a loss?
AB genotype
Normally: B/A- 0.5 BAF
Gain
- Duplication- 0.33 or 0.66 BAF
Deletion
- BAF 0 or 1
Can also be used to determine clone level- based on how separated the BAF is
How can BAF be used to identify copy neutral loss of heterozygosity ?
Both copies of the chromosome are from the same parent (either inherited or acquired in cancer), there will be two copies of each SNP, but because both regions are identical, every SNP will be homozygous. This will look the same as a deletion in the B-Allele frequency chart, but no copy number change will be visible in the LogR ratio chart.
What can cause CN-LOH
LOH as a cancer mechanism
Uniparental disomy (inherited or acquired)
Consanguinity
What are the advantages of SNP arrays?
Can detect CN-LOH
Higher resolution than G banding
More accurate clone level compared to G-banding, FISH but limited below 20%
Less subjective analysis
High throughput
What are the disadvantages of SNP arrays?
Can’t detect balanced translocations
5% limit of detections
Can’t detect SNVs
High level of artefacts
Quite laborious
Clonality limited to 20%ish so can’t be used for MRD
What diseases regularly use SNP arrays?
MDS
ALL
MM
CNS (methylation array)
What can SNP arrays be used to detect in cancer?
LOH
CNVs
Methylation
What is the difference between array CGH and SNP arrays?
Array CGH: It focuses on detecting variations in DNA copy number by comparing the fluorescence intensity of a test sample against a reference sample.
SNP Array: In addition to detecting copy number changes, SNP arrays assess the presence of single nucleotide polymorphisms SNPs and uses in silico tools for CNV calling.
What are Array Comparative Genomic Hybridisation (CGH)?
Patient and control DNA are labeled with fluorescent dyes and applied to bead chip with thousands of probes for regions fo interest
Patient and control DNA compete to hybridize which is then scanned my microarray
If gain then patient DNA will be more present than control and if loss then control DNA more present
What are the advantages of oligo array CGH?
Can detect genomic imbalances as small as 100bp, depending on coverage
More cost effective
better reproducibility and less batch-to-batch variation
Multiple consecutive probes indicating the same copy-number change are required to determine a gain or loss. This enhances the accuracy of the interpretation.
What are the disadvantages of oligo array CGH?
Low detection frequency of mosaicism (<30% of abnormal cells)
False positives
Can’t detect LOH or UPD
What are Bacterial Artificial Chromosome (BAC) arrays?
Type of CGH
BAC clones are propagated in vectors in bacteria, purified, amplified and then spotted onto a glass slide using ultra fine needles. Multiple copies of each BAC are spotted onto the array and distributed across the array
Due to the large size, BACs are very stable and hybridisation is specific
What are the advantages of BAC arrays?
Less CNVs of uncertain significance detected
High signal to noise ratio
accurate copy number information
What are the disadvantages of BAC arrays?
Abnormalities may be missed as unable to distinguish gains/losses <85kb or those which fall in 600kb gaps,
LOH/UPD not detected
less sub arrays per slide therefore expensive
Not really used in routine practice
What are expression arrays
Expression arrays allow the simultaneous investigation of the expression of thousands of genes and typically involve comparing two or more highly related cellular or tissue sources that differ in an informative way.
Used to investigate expression profiling in tumours and expression profiling of MicroRNAs.
What kind of methylation arrays are there?
Bilsulphite modification followed by microarray hybridisation: CpG islands in promoters of specific genes are hypermethylated in cancer genomes. Sodium bisulphite converts cytosine to uracil but leaves 5-methylcytosine (methylated cytosine) unchanged. Oligonucleotide probes on the microarray hybridise specifically to either the converted or unconverted sequence.
ChIP-on-chip, which locates protein binding sites that may help identify functional elements in the genome. For example, in the case of a transcription factor as a protein of interest, one can determine its transcription factor binding sites throughout the genome.
DamID (DNA adenine methyltransferase identification): an alternative to ChIP-chip, transcription factor or chromatin-binding proteins of interest are fused to DNA adenine methyltransferase.
Give an example of expression arrays in tumour profiling
Microarrays are now widely applied to the study of human cancer, for delineating molecular subtypes, and for assessing disease progression and treatment response
Breast
- 70-gene prognostic signature (Mammaprint) developed on Agilent platform found to be strong predictor for metastasis-free survival
- not yet used in diagnostic setting
Briefly define RNA interference (RNAi) and how recent array technologies have contributed in related research studies.
RNA interference (RNAi) is a post-transcriptional method of gene silencing.
The technique involves printing RNAi reagents onto a standard glass microarray slide, which is then placed in tissue culture dishes and cultured cells in medium are added to the arrays.
Cells that adhere to the spots internalise the printed material and become transfected, leading to silencing of a specific gene, and the remaining cells form a non-transfected lawn between spots.
Microarrays are then fixed and prepared for immunofluorescence, staining for DNA and F-actin, in situ hybridisation, apoptosis detection or other assays.
What are copy number variants?
A copy number variant (CNV) is a loss (deletion) or gain (duplications and triplications) of genomic material relative to the reference genome. CNVs can be intragenic, typically deletions/duplications larger than 50bp, or intergenic, involving multiple genes.
Name some of the current best practice guidelines used to interpret CNVs
Intergenic CNVs: The ACMG technical standards for the interpretation and reporting of constitutional copy-number variants- these guidelines are used for both rare disease and inherited cancer but not for acquired neoplasia
Intragenic CNVs: ACGS Best Practice Guidelines for Variant Classification in Rare Disease 2023 and CanVIG-UK Consensus Specification for Cancer Susceptibility genes (CSGs) of ACGS Best Practice Guidelines for Variant Classification 2023.
Acquired CNVs: The 2020 ACMG technical standards do not apply to acquired CNVs in neoplasia and there are currently no consensus guidelines adopted in the UK.