Lecture 12: Principles of genome and PCR based diagnostics Flashcards
What does the Sanger sequencing reaction require?
Template DNA to sequence
Primer
DNA polymerase
Nucleotides
What is the primer in the Sanger sequence for?
Short DNA sequence complementary to template DNA
Anneals to a template
How does the Sanger sequencing work?
Incorporation of ddNTPs into the new strands
Causes chain termination
What is Sanger sequencing also called?
Chain termination or dideoxy method
What is the difference between ddNTP and dNTP?
ddNTPs are missing a hydroxyl group on the 3’ carbon
Cannot form a bond with a 5’ phosphate
What is the ratio of ddNTP to dNTP?
1 dNTP: 100 ddNTP
What is the process of Sanger sequencing?
ddNTPs fluorescent in 4 different colours
Single reaction gives information on each base
Reaction are run on capillary gels - read by lasers and detectors as sample passes detection window
What is the run time of Sanger sequencing?
1-3 hours
What is the average read length of Sanger Sequencing?
700 base pairs
What sequencing method is considered the ‘gold standard’ and what is the accuracy?
Sanger sequencing
99.99%
What is the 2024 Sequencing Landscape?
Thermofisher (short read sequences, less than 1000 bases)
Illumina
PacBio
Oxford Nanopore (long read sequencers, greater than 1000 bases)
What is the high-throughput sequencing the pinnacle of?
Strain characterization
Epidemiological analysis
Conventional typing approaches
Resistance gene detection
Identification
What is Illumina sequencing?
Sequencing by synthesis method
Using primer annealing to the DNA template that is extended by polymerase
Fluorescently labelled nucleotides
How is the base chemically modified in Illumina sequencing?
Remove the fluorophore
Allow another fluorescent base to be added
What is the process of Illumina sequencing?
- Library preparation (adapters)
*added sequences hybridise to homologous primers on surface of sequencing flow cell - PCR: bridge amplification (enhances signal)
*making copies of all genes - Sequencing by synthesis (copy template strands)
*make DNA bind and bind to hybridised primers - Signal detection (1 image per cycle)
What is base call accuracy?
The probability of a correct base call
What is the run time, average read length and accuracy of Illumina sequencing?
Run time: 1-3 days
Read length: 2x75 - 2x300 bases
Accuracy: 99.9%
What is the disadvantage of short read sequencing?
Needs to be fragmented for sequencing
Million of 150 to 300 nucleotide long fragments
What are contigs?
Putting together overlapping reads to build longer sequences
Known as Contigs
What is genome assembly?
Search for sequence homologies among reads
Put together overlapping reads
Build longer sequences
Compares contigs to closely related genome to determine order and produce scaffold
What are the long read sequencers?
PacBio
Oxford Nanopore
What is the process of Oxford nanopore technology?
- Construct library
- Sequence library
- Analyse data
- NO AMPLIFICATION
How does Oxford Nanopore technology work?
Naturally occurring membrane proteins that form pores are modified and incorporated into artificial membranes
Added ionic buffer
Measure current passing though membrane, anything passing through pore changes current
Sensors for each pore can measure change - passing through current
Signal decoded using base-calling algorithms to determine DNA or RNA sequence in real time
What is the average read length of Oxford Nanopore technology?
Very long reads
Up to 4 MB
What is the accuracy of Oxford Nanopore technology?
95-99.9%
Why is typing needed?
Greatly enhance insights into microbial population and epidemiology and microbial diversity
Widely used in diagnostics, genomics, and pathogenesis related with microbiology research
Gives ability to accurately distinguish among strains of infection pathogens
What are the advantage of studying genomics?
Can query the whole genome or just a gene
Precision
Sensitivity
Speed
Personalisation
Why should genomic data be used in an outbreak?
Monitor trends at national level
- Emergence of new infectious agents
- Predicting severity and size
- Trends after interventions
- Understand historical trends
Better understand epidemiology at local level
- Identify source of transmission
- Investigate clusters of transmission
- Provide evidence for and against suspected transmission
- Reveal important, unsuspected clusters
What is the definition of molecular and genomic epidemiology?
Use of pathogen molecular and genomic data to determine the distribution and spread of an infectious disease in a specified population and application of this information to control health problems
Molecular data information on specific genes
Genomic data information on whole genome
What is a single nucleotide polymorphism (SNP)?
A genomic variant at a single base position in the DNA
What are principles of genomic epidemiology?
Replication and repair processes (stochastic errors, DNA polymerase proofreading, DNA damage response)
Genomic factors (transcription, chromatin state, replication time)
Environmental exposures (UV light, chemotherapy agents)
What does genomic epidemiology sequencing allow for?
Generate genomic fingerprint reflective of the organism
Detect all accumulating mutations in genome
Compare genomic fingerprints
How does genomic epidemiology work?
Using sequencing data, determine and visualize relationships
SNPs provide basic unit of difference
Tree is indicative of relatedness of different isolates
What is SNP calling?
Identification of SNPs that vary among strains
Detected by mapping sequence reads against reference genome
Relatedness based on number of SNP differences between isolates
Why is a choice of reference crucial in SNP calling?
High-quality, closed reference genome
Ideally closely related to set of traits under investigation
What are the key strength and weakness of conventional MLST?
Low sensitivity - good for phylogenetic analysis
High robustness - not good enough for outbreak investigation
What is the definition of the core genome?
Set of genes that is found in every member of biological species
How does Core genome MLST work?
Genome data is aligned to a scheme
Comparison will yield either allele number of an allele sequence already present in a scheme
Allele profile produced
Total number of differently assigned allele numbers between a pair of samples determines relatedness with each other
What is the difference between core genome MLST and whole-genome MLST?
(cgMLST vs wgMLST)
wgMLST extension to cgMLST
cgMLST uses core genome loci and accessory loci
wgMLST provides higher resolution as large set of loci used
wgMLST more useful when comparing closely related isolates
What is an accessory genome?
Gene set shared within only one or some strains
What is the difference between SNP and cgMLST?
SNP and MLST analyses are congruent
Both approaches are well suited and commonly applied for outbreak analyses
SNP has higher discriminatory power - cgMLST reports allele difference between 2 strains, doesn’t provide information on number of mutations
SNP calling best for analysis of detailed evolutionary models
What 6 categories can virulence factors be differentiated into?
Adherence and colonization factors
Type I and VI secretion systems
Immune evasion factors
Toxins
Siderophores for iron absorption and invasion genes
Why is whole genome sequencing detecting virulence factors important?
Makes it possible to identify new gees or gene variants that confer virulence to bacteria
It is a relevant indication for the pathogenicity of most bacteria
What is the mobilome?
Set of mobile genetic elements that strongly influence bacterial phenotypes by their presence of absence
What are the difference types of mobile genetic elements and what do they force?
MGEs force exchange of genetic material between different bacteria
Different types: insertion sequences, bacteriophage, integrative and conjugative elements, plasmids
Why can mobile genetic elements of unrelated bacteria be easily detected?
They often exhibit G +C contents that differ to some extents from that of their hosts
Indicating earlier events of lateral gene transfer
How can antimicrobial resistance be identified in whole-genome sequencing data?
Using assembly-based approaches
Using BLASTIN-based tools comparing derived draft genome to AMR reference gene databases
Why are antimicrobial resistance databases heavily biased towards easy-to-cultivate pathogens?
By using current antimicrobial resistance databases
Novel or remote homologous AMR genes from less well studied bacteria might be missed
Why is complete replacement of phenotypic AMR measurement by molecular AMR prediction not advised?
Bacterial strains continue to evolve and new resistance mechanisms emerge that are not represented in AMR databases or in the datasets used to train machine learning models
Why does phenotypic testing of a set of representative diverse strains need to be maintained in antimicrobial resistance testing?
Ensures genotypic AMR results do not diverge from the true AMR phenotype over time
What is a overview of whole-genome sequencing?
The process of sequencing and assembling the microbial genome of an organism of interest
Requires culture and isolation of the organism
Limitation for organisms that are difficult to grow or unable to grow in culture
What are the key points from the whole genome study on Acinetobacter baumannii?
Nosocomial outbreak of Acinetobacter baumannii
Typed by PHE, designated unique pulsotype 27
Defined 7 major outbreak genotypes
Able to follow genotypes of spreading between wards consistently with timeline of patients in wards
What can be done for organisms that are difficult to grow or unable to grow in culture?
Targeted sequencing
Shotgun sequencing
What is targeted sequencing?
Selection or enrichment process performed for an organism or a group of organisms of interest either prior to or after the library preparation process
Selection via PCR or probe hybridisation
Minimises human DNA interference and provides higher sensitivity of detection in sample types
What is shotgun sequencing?
Sequencing of all nucleic acid in a sample to identify ALL bacteria, fungi, viruses, and parasites in a test
Used to detect infection from specimen types that are normally sterile such as CSF, blood and joint fluid
Can detect infectious agents in specimen types with a documented microbiome
Limitation: background noise or interference of human nucleic acid or the resident microbiome
What is a limitation of shotgun metagenomics?
Only small proportion of reads are non-human
Only subset may correspond to potential pathogens
Sensitivity is critically dependent on the level of background and distinguishing from normal microbiome
How has shotgun metagenomics been applied to the management and treatment of C. Difficle?
Studies characterised microbiomes of patients and lead to the development of bacterial probiotic mixtures that can be administered as pills
These can be administered as pills for prophylaxis or treatment of C. Diff associated disease
What is the future of genomics in a clinical setting?
Only for select few clinical microbiology laboratories with budget and personnel to make it possible
Depended on price to make it more readily available to regional labs
Turn around time beating culture-dependent methods
Improving samples processing to overcome issues with contaminating human DNA and low input inherent of clinical samples
Need improvements in understanding of genotype
What do current shotgun sequencing bioformatics do and need improving on?
Detect multiple pathogens that are present within a given sample
Find it difficult to assemble the full genome of a pathogen
Improvements leads to confident pathogen assembly of a complete genome will give ability to predict virulence and antimicrobial resistance markers