NGS Next Generation Sequencing Flashcards
Sanger sequencing
But, for example, to sequence the human genome with Sanger sequencing:
Human genome = 3GB = 3,000,000,000 bases
Sanger sequencing = 800 bases/reaction
=> 3,000,000,000 bases / 800 bases/reaction = 3,750,000 reactions.
If we assume £5/reaction
=> 3,750,000 reactions x £5/reaction = £18,750,000/genome
(today’s price, no overlap, no depth. Also issues with time, amount of DNA, etc
Sanger sequencing = First generation sequencing
NGS = Next generation sequencing (various technologies)
Third generation sequencing
Name the types of Next-Generation DNA Sequencing
(dont learn)
- Roche 454
- ABI SOLiD
- Ion torrent
We focused on
* Illumina (Solexa) sequencing
* PacBio SMRT
* Oxford Nanopore MinION
What are the NGS vs Sanger sequencing: pros and con
Advantages of NGS including third gen sequencing
* high throughput (many different molecules): many (millions) samples processed in parallel and at the
same time
* does not require (generally) cloning procedure and libraries
* detection is cyclical and in parallel (“massive parallels sequencing”) huge amount of data
* “low cost” per big volume of data
* “quicker”
* => “more data in less time”
sanger is a termination chain sequencing
NGS uses different sequencing biochemistries
Disadvantages of NGS
* more error-prone (assembly)
* often under development
* +/- can produce long and short reads roughly from 50 to 20000 base pairs
* much more computational-intensive
* needs more specialistic knowledge, including analyses pipelines
* issues with data storage
How to sequence a genome
What are the general strategies?
Library:
A library in molecular biology refers to a collection of cloned DNA fragments that are inserted into a vector.
The vector allows for the replication and selection of DNA-containing cells, and the library is propagated in host cells.
The fragments of interest are isolated and ligated into a suitable vector, providing a method for studying specific DNA sequences.
A good library is one where the inserted DNA fragments are not rearranged, even representation of source material, no vectors with multiple inserts and no empty vectors.
Shotgun Sequencing:
Shotgun sequencing is a method of DNA sequencing where the genomic DNA is randomly sheared into small fragments.
These fragments are then sequenced independently, and the resulting sequences are aligned using short overlapping regions called k-mers.
By aligning the sequences, one can identify repetitive fragments and reconstruct the original genome sequence.
The process involves determining how many times each position in the genome has been sequenced, with higher coverage indicating greater likelihood of accuracy.
Care must be taken to account for gaps and regions with zero coverage, requiring further analysis.
Characteristic of Good Libraries:
How does allumina work (watch video on slides)
Library Preparation:
DNA Fragmentation: Genomic DNA is sheared into smaller fragments. The size of these fragments is critical for sequencing.
Adapter Ligation: Adapters, short DNA sequences needed for the sequencing process, are ligated to both ends of the DNA fragments.
Library Amplification:
PCR (Polymerase Chain Reaction): The DNA fragments with attached adapters are selectively amplified to create clusters of identical DNA fragments. Each cluster represents a single DNA molecule.
Cluster Generation:
Bridge Amplification: The PCR-amplified DNA fragments are immobilized on a solid surface, such as a flow cell. Each DNA cluster is formed by the bridge amplification of the single DNA molecules.
Sequencing:
Base Incorporation: The flow cell is flooded with fluorescently labeled nucleotides (A, T, G, C), each carrying a different fluorophore corresponding to the base. DNA polymerase incorporates these bases onto the growing DNA strand.
Detection: A camera captures images of the flow cell after each nucleotide incorporation. The emitted fluorescence indicates the identity of the incorporated base. The fluorophore is then chemically removed, allowing the next round of nucleotide addition.
Image Processing:
Base Calling: The images are processed to determine the sequence of bases in each cluster. Fluorescence signals are converted into nucleotide sequence information.
Data Analysis:
Alignment: The sequenced fragments are aligned to a reference genome or assembled de novo.
Variant Calling: Differences or variations in the sequenced DNA are identified
Data Processing: Raw sequencing data is processed to remove errors, filter out low-quality reads, and convert it into usable formats.
Annotation: The identified variants and genomic features are annotated for biological relevance.
Data Interpretation:
Biological Insights: The final sequenced data is interpreted in the context of the biological question or hypothesis. This may involve identifying genes, regulatory regions, or understanding genomic variations.
PacBio SMRT: Single molecule real time sequencing (third gen sequencing)
How does this work?
You shear the genome and linkers (adapters) to the end, two hairpins. You then add a polymerase and a primer to the now circular template and put them in a smark cell which contain tiny wells called Zero Mode Wave guides (zmw).As the polymerase incorporates labelled nucleotides, light is emitted, so it can be measured in real time. Everytime a new nucleotide is added a new flash of light which is a different colour depending on the nucleotide.
You can sequence much longer fragments, if there is lag time to form one base to another, it tells you if that base was modified or not so it allows you to read epigenetics
How does oxford Nanopore works
It is tiny, it is portable, and you can sequence part of your sequence on the field.
What do you need to know to choose the correct platform
- detection method and chemistry
- length of reads
- number of samples per run (reads)
- (massively) parallel sequencing => high throughput
- multiplexing
- simplicity of sample preparation (addition of adaptors, amplification)
- amount of material required
- error rate
- sequencing depth
- run speed
- developments of analysis bioinformatic pipelines
- initial investment of hardware, total cost for sequencing projects
Genome sequencing and annotation
- Genome sequencing, content, function, etc.
- Sequencing many more genomes, more quickly, cheaply
- Compare genomes across different organisms
- Compare genomes across cells of the same organism
- Comparative genomics in evolution, disease, etc.
- Detection of variants and higher precision
- Epidemiology
- Whole transcriptome sequencing (-omics)
- Epigenetics
- Species identification
- Localisation and dispersal (of organisms, populations, species, higher taxa, disease vectors, …)
- Habitat extension
- SNPs to look at population structure/kinship relationships
- Phylogeography
- Comparison of pathogenic/non-pathogenic organisms (bacteria, viruses, protists, metazoans,
etc.) - Host-parasite evolution
- Discovery and characterisation of novel markers (cheaper and better than existing ones)
- Synthetic biology/bioengineering, to discover new functions
How does Sanger sequencing work?
DNA Template Preparation:
A DNA sample containing the region of interest to be sequenced is denatured (melted) to separate the two strands of the double helix.
Primer Annealing:
Short DNA primers that are complementary to sequences flanking the target region are annealed (bound) to the single-stranded DNA template.
DNA Synthesis with Dideoxynucleotides (ddNTPs):
DNA synthesis is initiated by DNA polymerase, and in each reaction, normal deoxynucleotides (dNTPs) are incorporated into the growing DNA chain.
However, the reaction also includes small amounts of dideoxynucleotides (ddNTPs), which lack the 3’ hydroxyl group needed for further chain extension.
Chain Termination:
When a ddNTP is incorporated into the growing DNA chain, it terminates further extension because it lacks the 3’ hydroxyl group to which the next nucleotide would be added.
As a result, the growing DNA strands are randomly terminated at positions corresponding to the incorporation of ddNTPs.
Fragment Separation by Gel Electrophoresis:
The resulting mixture of DNA fragments, terminated at different positions, is separated by size using gel electrophoresis. The gel allows smaller fragments to migrate more quickly than larger ones.
Visualization and Analysis:
The separated DNA fragments are visualized using fluorescence or radioactivity, depending on the labeling method used for the ddNTPs.
The sequence is then determined by reading the positions of the terminated fragments from the bottom to the top of the gel.
Base Calling:
The sequence is determined by identifying the bands on the gel corresponding to each terminated fragment and reading the order in which they appear. Each band represents a specific nucleotide.
Data Interpretation:
The resulting sequence data is analyzed by computer software, and the final DNA sequence is generated.
How would you pick between shotgun and libraries?
Comprehensively Sequence an Entire Genome:
Shotgun Sequencing because it is more effective for large, complex genomes, providing a broad overview but may require more resources and computational effort.
Target Specific Genomic Regions or Conduct Functional Library-Based Approach as it is suitable for focused studies on specific genes, regulatory regions, or functional elements, offering higher accuracy for targeted regions.
Considerations for Decision:
Cost and Resources: Shotgun can be resource-intensive; Library may be more cost-effective for smaller projects.
Research Goals: Shotgun for de novo sequencing; Library for targeted analyses.
Bioinformatics Capabilities: Shotgun requires robust computational tools; Library may be more straightforward in analysis.
Technology: Shotgun uses NGS; Library can use various sequencing techniques.
Reference Genome: Shotgun in the absence of a reference; Library when a high-quality reference is available.
How can you identify genes after genome sequencing?
- using cDNA/transcriptome sequences from the same species
- using cDNA/transcriptome sequences from similar taxa
- using similarity to known proteins
- ab initio, using mathematical/probabilistic gene finders