Chaudhuri Flashcards
What 1st gen sequencing platforms are there?
- Maxam Gilbert (but no longer used)
- Sanger dideoxy seq
Why did Sanger become dominant in the field?
- indiv, long (approx 1kb), high quality reads
What 2nd gen sequencing platforms are there?
- Illumina (most popular)
- also Helicos, SOLiD, 454, IonTorrent (still used but infreq)
What 3rd gen sequencing platforms are there?
- PacBio
- Oxford Nanopore
- NABsys (no longer used)
- more to come?
What do PacBio and Oxford Nanopore have in common?
- both these are single mol seq, fewer reads than Illumina but v long (over 10kb), individual reads have quite high error rate
What key properties of sequencing reads determine the approp applications?
- no. of reads (related to data output and running costs) and the read length
Can you get good read lengths and read depth?
- originally was trade off between these, eg. Illumina has many short reads, vs Sanger produced few long reads
- now there are technologies which can produce many long reads (>10kb), eg. Nanopore ProMethION and PacBio Sequel II
What is the read length for Sanger?
- up to 1kb
What is the read length for Illumina?
- 50-300bp
What is the read length for PacBio?
- up to 50kb
What is the read length for Oxford Nanopore?
- can be >2Mb (theoretically unlimited)
How many reads can Sanger prod per run?
- 1 (some machines up to 96)
How many reads can Illumina prod per run?
- millions (MiSeq), billions (HiSeq)
How many reads can PacBio prod per run?
- approx 500k (Sequel)
How many reads can Oxford Nanopore prod per run?
- up to 1 mil (MinION)
What is the accuracy of Sanger?
- highly accurate (>99.9%)
What is the accuracy of Illumina?
- highly accurate (>99.9%)
What is the accuracy of PacBio?
- raw reads approx 85% accurate, can be improved to >99.8% w/ CCS (circular consensus sequencing)
What is the accuracy of Oxford Nanopore?
- raw reads no approx 95% accurate
What are the applications of Sanger?
- PCR products
What are the applications of Illumina?
- draft genome seqs (w/ gaps)
- resequencing and variant detection
functional genomics (RNA-seq, ChIP-seq) - metagenomics
What are the applications of PacBio?
- complete genome sequencing (ie. finished genomes as longer than repeats
- detection of DNA meth (ie. base mods, epigenetics)
What are the applications of Oxford Nanopore?
- complete genome sequencing
- epigenetics
- direct RNA-seq
- metagenomics
How does Illumina work?
- cut gDNA into 200-600bp fragments
- add adapters (know seq of and can use to amplify fragments of genome which dont know seq of)
DNA fragments which bind adapters are made ss - adapters able to bind to oligos on flow cell surface
- unlabelled nt bases and DNA pol added to lengthen and join DNA seqs
- adapter seq at other end binds another type of oligo on surface and creates ‘bridges’ of ds DNA on flowcell surface (by seqs folding over and hybridising to oligosl)
- in situ PCR = bridge amplification
–> amplify original DNA to form small clusters of DNA w/ same seq - dsDNA bridges broken down to ssDNA w/ heat
- primers and fluorescently labelled bases added to flowcell
- primer binds DNA being seq and allows DNA pol to bind
DNA pol adds bases to DNA - lasers used to activate fluorescent label and camera detects this fluorescence
- each base gives off diff colour
What are the clusters in illumina flow cells and how are they distrib?
- each cluster derived from single initial mol and corresponds to separate read
- clusters distrib randomly on flow cell surface
What limits the max no. of reads it is poss to obtain from a single run of Illumina?
- density of clusters determines total yield or reads, but if adj clusters too close seq cannot be resolved (ie. want them as close as poss whilst still being able to resolve)
What has helped improve the no. of reads can obtain from a single Illumina run?
- software improvements have allowed increased cluster density
- also technical improvements –> eg. higher res cameras in machines, thus means can have small and closer clusters
How had patterned flow cells allowed increased cluster density?
- instead of flat surface, flow cell covered w/ tiny nanowells
- primers for branch amp only present w/in each well, so get single cluster gen in each well from a single starting mol
- amp rapid, so well fills up, preventing other mols from entering
- know exact position of each well, so cluster can be identified unambiguously
- cluster cannot spread outside well, so no overlapping clusters, means clusters can packed tightly
What recent devs has there been for Illumina?
- Illumina X ten released in 2015
- new system targeted exclusively at seq human genomes
- machines cost $1 mil and have to buy 10
What is the significance of $1000 human genome, and what achieved this goal?
- long standing goal of genomics, as it is the point it becomes feasible to offer genome sequencing as a routine service in healthcare)
- Illumina X ten can, inc consumables, labour and depreciation
What other systems are available for patterned flow cells?
- HiSeq X Five and HiSeq400
What is the main problem w/ Illumina, and what is this due to?
- reads are limited in length as the quality of base cells reduces later in read, resulting in more errors
- due to problems of phasing
What is phasing?
- by chance a random base will not incorporate into 1 of the reads, then this read will be lagging behind by 1 base, so start to get a mixed signal
- this can happen repeatedly and as this develops become less confident in the colour of the cluster, as more mixed
What is pre-phasing?
- early incorp of bases
- essentially the opp problem to phasing
How can the problem of phasing be solved?
- often reads will be trimmed to remove low quality seq prior to analysis (gets v low quality after around 100bp)
- recent software improvements on Illumina MiSeq allowed dynamic correction of phasing problems, increasing read length to 300bp
Why can corrections to solve problems of phasing no be used for the HiSeq?
- correction computationally intensive, so can only used on a small scale
How can 2 colour Illumina seq be carried out?
- 4 bases sequenced using only 2 colours, rather than 1 for each base
- allows simpler optics in machine, therefore lower costs
- T = green, C = red, A = green and red, G = no colour
- used in NextSeq500 and Miniseq
What is PacBio RSII and its apps?
- single molecule real time sequencing (SMRT)
- can gen v long reads (>10kb)
- apps inc finishing small genomes, microbial epigenetics, targeted seqs
How has PacBio advanced?
- PacBio Sequel allowed more reads at lower cost
- now Sequel II released
- becoming more practical get such long reads at a lower cost, this is mainly done by improvements in optics and chemistry
Can the read length from Oxford Nanopore be improved?
- already at theoretical max –> limit is length of DNA mol, w/ some reads reported of >2Mb
What Oxford Nanopore developments have happened?
- highly portable MinIONs now commercially available, w/ a few early publications
- scale of sequencing has been improved by the release of the larger PromethION and GridION systems
What is the PromethION system?
- essentially lots of MinIONs together (25) –> potential to prod lots of high quality long reads
What is the GridION system?
- 5 MinION flow cells at a time
What is VolTRAX?
- automated Oxford Nanopore library prep, goal is to take any biological sample (eg. blood, bacterial culture), deposit straight on machine, will extract DNA suitable to pass straight onto MinION sequencer
In 1983 at the dawn of bacterial genomics what was known in the field?
- only 2.3x106 bps had been seq, less than the size of most bacterial genomes
- largest genome sequences was phage lambda, around 50,000 bp
- aimed to next seq bacterial genomes
- and eventually humans
What was the 1st bacterial genome sequencing project to be initiated, and how was this done?
- E. coli K-12
- sequencing ordered clones based on a genetic map
Why was E. coli K-12 not the 1st bacterium to be sequenced, despite being the 1st to be initiated?
- method was slow and laborious
What were the 1st bacterial genomes to be sequenced, and how was this done?
- Venter adopted a shotgun sequencing approach and sequenced Haemophilus influenzae and Mycoplasma genitalium in 1995 (both have small genomes, <2Mb)
What does shotgun sequencing rely on?
- computational assembly of seq from random clone libs
How is whole genome shotgun sequencing carried out?
- take (usually) circular bacterial chroma and use sonication/enzymatic methods to randomly shear DNA into small fragments
- size select, so all about same size
- clone indiv fragments into plasmid vectors
- pick colonies to create shotgun lib
- plasmid preps
- seq each insert w/ 2 primers (using Sanger)
- assemble
- get whole chunks of genome which are representative, but also gaps
- PCR over gap regions, to fill them in (can be most time consuming part)
What did Kimelman et al (2012) do in the field of bacterial genomics?
- reinvestigated Sanger seq data of bacterial genomes
- tested hypothesis that assembly gaps correspond to seqs toxic to E. coli
- identified many compounds toxic to E. coli
- found novel toxins and restriction enzs, and new classes of small noncoding RNAs that reproducibly inhibit E. coli growth
- suggests new modes of antimicrobial intervention
How was E. coli K-12 originally iso?
- from convalescent diphtheria patient in 1922