High Throughput Sequencing Flashcards
massively parallel sequencing technologies
clonal amplification
physically immobilise clusters of template molecules
monitor sequence opticallly (CCD)
decode sequence by computational analysis (snapshots)
map short reads to reference genome
454 - roche
pyrosequencing chemistry
~1,4 mi reads per experiment
200-400bp per read
single a base accuracy: >99,5%
1- fragment genomic DNA
2- ligate adapters - ssDNA
3- capture DNA on 28pm beads (1frag/bead)
4-amplification by emulsion phase PCR
5- break water in oil emulsion
6- centrifuge beads into picolitre reactors (1 bead/well)
7- add packing and enzyme beads (ATP, sulphurylase/luciferase)
8- load into sequencing flow cell
9- reagents flow across reaction wells (1x10-12e)
10- incorporation of nt in each well - light
11- captured by CCD
12- sequence of flashes reveals sequence of DNA in wells (pyrosequencing) - each well generates flow gram
454 roche - applications
genome sequencing, transcriptomic, epigenetics, palogenomics, metagenomics
2007- fragmented human gDNA into 3kb pieces, circularised, captured end junctions, sequenced - >30 million paired end reads, validation
ended support in 2016 changing to iontorrent semiconductor sequencing
iontorrent semiconductor sequencing
replaced by direct H+ sensing on silicon chip - detects H ions released during the process of nt integration facilitating real time surveilance of DNA synthesis
long reads (400bp), pomopolymer error, cheap
1- DNA fragmentation
2- linker ligation
3- emPCR
4- physical bead loading
smaller beads (3um), more density
simpler incorporation detection (10000s of tiny pH m)
wells addressable by electrode array
unlabelled nt flow across wells containing primed template and polymerase
proton release - currrent (proportion to nº nt)
flow gram (current detected vs nucleotide flow)
Ion torrent proton
Ion torrent proton: $1000 genome - PI proton chil delivered 10Gb (70-80M reads, +150bp) of sequence data overnight run; PII 60Gb never released (2 human genomes at 20x coverage in single run); ion gene studio SS - 130 milion reads
illumina sequencing
immobilise > many billions template molecules/flow cell
amplify in situ (solid phase PCR)
sequence by synthesis (SBS), optically monitor reaction
4billion reads/ experiment, 2x300bp paired ends
1- prepare DNA: randomly fragment and ligate adapters to both ends of fragments
2- attach DNA to surface: binds ss frament randomly to the inside surface of low cell channels
3- bridge amplification: add unlabeled nt and enzyme to initiate solid phase bridge amplification
4- fragments ds: enzyme incorporates nt to build ds bridges on solid phase substrate
5- denature ds: denaturation leaves ss anchored to substrate
6- complete amplification: several million dense clusters of dsDNA are generated in each channel of flow cell
7-determine 1st base: add all 4 labelled reversible terminators, primers and DNA polymerase
8- image 1st base: after laser excitation, capture imagge of emitted fluorescence from each cluster, record
9- determine 2nd base and image
10- sequence reads over multiple chemistry cycles: repeat cycles of sequencing to determine the sequence of bases in fragment a single base at time
11- align data
Pacific Biosciences
Novel technology (SMRT-single molecule, real time) and capability – long reads (average 4kb!)
accuracy low – initial applications involved supplementary illumina sequencing (to correct errors!)
SMRT: Sequencing of single template molecules by single, natural, polymerase molecules (bacteriophage enzyme), Incorporation of fluorescently labelled dNTPs, captured in real time, real time output of 1000s of polymerases captured simultaneously (fast computing!), 450 Mb in minutes
how does it work…?
* ZMW contains single polymerase (complexed with template)
* Only bottom of well illuminated (microwaves…?)
* Diffusion – microseconds, incorporation milliseconds (1000X slower)
* Release of label – natural DNA
* Real time data collection…
Pros
-Single molecule (no amplification) – simple sample preparation
-Long templates (rolling circle amplification)
-Speed, low cost (reagents – instruments ~$1M!)
Cons
- DNA only (so far)
- Error rate (mis-incorporation / indels) versus consensus of 1000s (circularisation helps…)
- Relatively small number of “features”: 75-150,000 versus 1 million (454), billions (illumina)
Applications / successes
- enable RAPID (~days) complete microbial genome sequencing: Sproutgate, Haiti cholera outbreak…
- Error rate still high (~1/10)
- Novelty: Full length cDNA sequencing (lncRNA, alternative splicing, alternative initiation and polyadenylation sites)
Nanopore sequencing
MinION, protein nanopore to read each DNA base
assembled yeast genome using MinION
nanopore reads to scaffold 120X MiSeq reads
99% consensus accuracy
35% error rate, 65% did not align to anything
Nanopore now being used for human genomes - 30,000bp
Exome capture / sequencing
solid or Liquid phase hybridisation used to capture genomic
DNA containing exons
Shear, linker, hybridise, reamplify sequence…
Map to genome, discard common variants
Identify amino acid changing mutations, diagnose disease
Cheap (er) ~$200 / exome!
RNA-Seq “transforming transcriptomics”
- EST and SAGE: only economic to sample transcriptomes
(RNA > cDNA) – Sanger sequencing ~ 0.5p / bp (expensive!) - Tagging methods (SAGE, CAGE, MPSS)
– Not all tags uniquely mappable
– Only fragments of transcripts analysed – isoforms
indistinguishable - Next Generation Sequencing of cDNA (RNA-Seq)
– Dramatically lower cost / bp (0.00005p / bp)
– Massively parallel = faster generation of much more
DATA!
– Quantitative and qualitative (sequence all the cDNA)
Advantages of RNA-Seq vs previous transcriptomic technologies…
reduced cost, time and RNA requirements
technical improvements: base resolution, dynamic range, isoform detection, novel transcripts…
RNA-Seq: biological advantages…
* Not limited to known transcripts (or species!)
– Sequencing, assembly and characterisation of transcriptomes of non-model species (9000 genes, 6.5X coverage)
* Novel isoforms of known genes (alternative splicing / TSS /
polyadenylation)
R(NA)evolutionary nanopore sequencing
Nanopores (in principle) can sequence RNA (just another NA)
Full length RNAs reveal transcription start sites, alternative splicing, alternative polyadenylation (but so can RNAseq…)
Direct RNA sequencing – any size RNA, RNA modification
Single Cell RNA-seq (scRNA-seq)
In metazoans (multicellular organisms), cell type determined by transcriptome
Individual cells contain enough RNA to identify cell type
Technical advances (fluidics) enable automated single cell sorting… (10X Genomics, but also Drop-seq, Seq-well)
– What cell types are present (normal / disease)?
– Are there novel cell types, distinguished by their transcriptome?
– Are diseased cells (tumour cells) transcriptomically different?
conclusions
illumina will likely continue to dominate cost efficient, accurate NGS (DNA and RNA-seq)
Lifetechnologies (Thermo) trying to compete with iontorrent / proton / S5 (speed)
Long read, low accuracy (PacBio / nanopore) offer “post-Sanger” capabilities (genomes / transcriptomes / epigenomes*)