Shane 2: Sanger Sequencing vs NGS Flashcards
Give the read length, no of reads/run, throughput, SNP error rate, Indel error rate and costs of Sanger Sequencing.
Read Length: 800bp
No of reads/run: 96 [<1 day]
Throughput: 6MB/day
SNP error rate: low
Indel error rare: low
Costs: 500 euro/Mb
Give the read length, no of reads/run, throughput, SNP error rate, Indel error rate and costs of Illumina
Read Length: 2x150bp
No of reads/run: 400,000,000 [<1 day]
Throughput: 120GB/day
SNP error rate: high (aprrox 0.5%)
Indel error rare: low
Costs: <0.05 euro/Mb
Talk about the efforts to sequence the first human genome
The human genome was first sequenced in 2003
At this time many different institutes were all working at the same time to sequence different chromosomes
It took about 10 years and about a billion euros to do
Sanger -> multiple sanger sequencing ran at once, all day everyday
Private industry looking to patent the human genome vs public industry
What is mean tby SNP error rate?
This is the ability of a system to correctly identify Snps/incorrect bases
A low error rates means a high likelihood that the sequence is correct
What are indels?
Insertions and deletions
Everyone has these, they are a part of normal variation of the human genome, the majority of these are harmless but some can be pathogenic
How large is the human genome?
3 billion base pairs of DNA on a single chromosome -> x2 copies => 6 billion base pairs in the whole genome
Why is Illumina NextSeq sequencing not done anymore and what is used instead?
Illumina can only do short reads
Paired end sequencing is now done -> sequencing the forward and reverse strand at a time
-> if you used Sanger you would have to design reverse and forward primers to do this etc
Why is Illumina NextSeq sequencing not done anymore and what is used instead?
Illumina can only do short reads
Paired end sequencing is now done -> sequencing the forward and reverse strand at a time
-> if you used Sanger you would have to design reverse and forward primers to do this etc
What are the four main next gen sequencing technologies available?
Illumina -> most prevalent
SOLID (life technologies)
Ion Torrent (life technologies0
Pacific Biosciences
What are the two main 3rd generation approaches to sequencing
Oxford Nanopore (commercially available)
Illumina Nanopore (licensed an alternative Nanopore technology)
What is the basis of ion torrent technology?
Uses semi-conductors to tell when hydrogen ions are released
Why are there so many different sequencing technologies?
Companies have to keep making new sequencing technology as you cant get a patent for Sanger etc
What are the two main sequencing template approaches to sequencing?
Clonal amplification of single molecules
Single DNA Molecule as a squencing template
What is meant by the clonal amplification of single molecules as an aproach to sequencing, give two examples
Single molecule only briefly needed as a template
— Thousands of identical molecules boost signal
— Two different methods:
• Bridge amplification of molecules immobilized on surface - Illumina
• Emulsion PCR — Ion Torrent
What is meant by the single DNA molecule as a sequencing template as an aproach to sequencing, give two examples
— Challenge of keeping single molecules stable during sequencing
— Avoid amplification biases
— Pacific Biosciences, Oxford Nanopore
How does clonal amplification signaling work?
Makes use of amplification (by PCR) to amplify up a sequence before its fluorescence detection
Used in Ion torrent and bridge aplication methods
-> generally if you detect fluorescence you have to amplify up prior e.g. in ion torrent you have to increase the number of DNA molecules to burst the signal so we can detect enough hydrogen ions
What is the main benefit of single DNA molecule sequencing
This allows us to sequence much longer continuous strands of DNA
What is the history behind the Illumina: Flow Cell method of sequencing?
Illumina: Flow Cells with “Molecular Colonies”
Originally research done in Cambridge, was known as Solexa -> chemistry department of Cambridge
Sold to Illumina in 2006
How does the Illumina Flow cell sequencing method works?
Makes use of flow cells, a type of slides
Short oligonucleotide sequences only a few nucleotides long are spread across the entire surface of flow cell
These are used to bind DNA onto flow cell
Clusters on flow cell are formed of the same sequence of DNA -> each cluster started of as one DNA molecule that first bound and then is amplified by PCR to produce many copies in close proximity to it
What are the main pros and cons of the illumina sequincing
Can sequence millions of cluster reactions at once i.e. on the same flow cell
You have to measure (take an image) every cycle i.e. the sequence is built up one nucleotide at a time
It requires specially designed chemistry using reversible dye-terminators and a polymerase
Termination is a reversable process unlike Sanger -> this alloows us to stop the reaction and on another nucleotide at any point, image it, and then continue the reaction
What kind of nucleotides are used in Illumina sequencing
Fluorescently labelled reversible terminater nucleotides
How are the fluorescent nucleotides used in Illumina reversible?
You can chemically cleave of the fluorescent group of the nucleotide and wash it away at any point -> have to wash away to prevent background fluorescence when you go to add next nucleotide
You can then block the 3’OH group until your read to add the next nucleotide group - gives us time to image the last previous nucleotide that was added (temporarily block 3’ OH)
When the OH group is freed you can add the next nucleotide
What is the main con of the Illumina sequencing
Takes a lot of time due to many washing steps
To sequence 150bp sequence there will be 150 wash cycles -> add reversible nucleotide, block 3’OH, read fluorescent signal, cleave off/wash signal off, add next nucleotide
150bp length can take 12 hours or longer
What kind of elongation is used on the illumina?
Illumina Paired End Sequencing
What is illumina paired end sequencing?
A method of sequencing two strands of DNA at the one time
Genomic DNA, purified, denatured with heat, fragmented into small sequences using enzymes or sonicaters
Ligate on adapter sequences onto the forward and reverse short sequences using DNA ligase
Each sequence now has an adapter 1 site (A1 site) and priming site (SP1), the complementary reverse has an A2 and SP2 sites
The A1 site allows binding onto a flow cell
The priming site is a sequence for which you can design primers for i.e. it allows you to prime the sequencing reaction, complementary design for reverse i.e. use primer 1 for forward sequence and primer 2 for reverse etc
Give a brief run down of illumina paried end sequencing
Fragment genomic DNA
Ligate adapters
Generate clusters - bind to flow cell
Sequence first end
Regenerate clusters and sequence paired end
Give a brief run down of illumina paried end sequencing
Fragment genomic DNA
Ligate adapters
Generate clusters - bind to flow cell
Sequence first end
Regenerate clusters and sequence paired end
Talk a little about the illumina flow cells
There are like several hundred million flow cells
Hundreds of millions of reactions on the one plate
Clusters are only a micron in size - smalled than a bacterial cell
Sequencing one nucleotide at a time -> creates a sequence about 150bp in length => 150 images put together to determine sequence
Give some examples of illumina sequencers
MiniSeq System -> for targeted sequencing
MiSeq series -> small genome and targeted sequencing
HiSeq X Series ->population and production scale whole genome sequencing
NovaSeq Series -> population and production scale genome, exome, transcriptome sequencing and more -> can cost up to a million euros -> none in Ireland
Illumina Sequencing
(2 pros + 3 cons)
Pros:
- Very high throughput - can do millions of clusters per cell
- Best price/bp but machines can be very expensive/some are affordable
Cons:
- relatively long run time -12 hours plus for a run of 150bp
- Sequencing quality decreases towards the end -> polymerase struggles to incorporate large fluorescent molecules near 150bp
- Imaging interference in low diversity libraries -> fluorescence interference in highly repetitive sequences
How does ion torrent sequencing work
Developed in 2010 by Life Technologies
A form of emulsion PCR using magnetic beads
How does Ion torrent work
Uses magnetic beads coated in short oligonucleotides
Each bead is in an oil droplet along with DNA polymerase ad nucleotides
Chips have thousands of wells each with ion sensors
Each bead fits into a well
Semi conductor detect the release of hydrogen ions released when any nucleotide is icorporated by polymerase
Why do we need to amplify the signal for ion torrent
Its done to increase the amouont of H+ signal prodced so that its release can be detected by an ion sensor
How do ion sensors work?
They detect H+ released upon incorporation of nucleotides by polymerase
They borrow the technology used in semi-conductors
Ion torrent sequencing is based on what kind of sequencing?
Semi-conductor sequencing
The sequencing is carried out on the chip, no imaging is required
How does H+ detection work in ion torrent sequencing?
H+ is released with the formation of phosphodiester bonds
This brings about a pH change
Slightly acidic
pH is measured with a sensor
Explain in your own way how ion torrent sequencing works?
Cycle 1: add an A, if the A is not encorporated i.e. no H+ released then wash away
Cycle 2: add a G, if not encorporated wash away
Cycle 3: add a C etc etc
Cycle 4: add a T, if encorporated then H+ released
Therefore we know there is a T at position 1, then we go onto the next cycle and keep repeating until you get a desired length
If the signal detected is twice as strong upon adding of a nucleotide in ion torrent sequencing, what does this indicate?
If the signal is twice as strong you know two of the same nucleotide have been added on in a row
What are two examples of ion torrent sequencers?
Ion PGM
Ion Proton
Talk about the Ion PGM Ion torrent sequencer
Personal genome machine
3 different types of chips
Can do 200 or 400bp reads
Can run up to 5.5million reads/Ion 318 chip i.e. over 5 million wells per chip
4-7 hour run time
Much quicker than sanger, no need for fluorescence etc
Talk about the Ion proton ion torrent sequencer
Newer ion torrent sequencer
Up to 200bp reads
Up to 60-80 million reads (way more than ion PGM)
2-4 hour run time -> short run time hence its use in hospital labs
Useful for targetted gene sequencing e.g. for cancers or genetic disorders
What are the three main pros of ion torrent sequencing and the two main cons
Pros:
- fast
- relatively cheap
- scalable (can buy different chips depending on need)
Cons:
- relatively high error rate
- emulsion PCR
Talk about the high error rate of ion torrent PCR
High error rate seen where there are repetitive sequences
e.g. 3 As in a row -> you would think the signa would be three times higher but this is not always the case
there was a very high error rate associated with this in the beginning but it has since gotten better
Talk about the difficulties of emulsion PCR in Ion torrent sequencing
Emulsion PCR is very technically challenging and can take a while to get it to work in the lab
What is PacBio Sequencing and who set it up?
Single Molecule Real Time Sequencing (SMRT)
Set up in 2011 by Pacific Biosciences
Developed in Standord University California
How does PacBio Sequencing Work
Single Molecule Real Time Sequencing SMRT
Chips have individual wells
One copy of DNA sequence in each well
DNA in single strand form
Inside each well there is an imobilised DNA polymerase i.e. stuck to bottom of well
There are fluorescently labelled nucleotides in well floating around
Polymerase will incorporate complemenary nucleotides
A fluorescence pulse occurs everytime a nucleotide is added on
The fluorescent pulse is measured in real time - this happens very quickly
The polymerase cannot move hence how we know the exact position where the nucleotide is bing added on
Why is the DNA polymerase immobilised in PacBio Sequencing, and how
Immobilised by fixing it to the bottom of the well
This stops the polymerase randomly inserting nucleotides
Allows us to know the start point of transcription
Talk about the wells used in PacBio Seuencing, why is this done
The wells in the chips used are very shallow
This stops any fluorescent nucleotides from floating out
Talk about the pros and cons of PacBio Sequencing
No PCR ie dont need to amplify DNA prior to sequencing
Can do very long reads - read lengths averaging 10-15kb and a max of 40kb
Can be used to observe DNA modifications
Throughput per run is low -> approximately 1 million reads
Run time is short
Error rate is high - same nucleotide repeats cause issues
Talk about Oxford Nanopore, what is the principle behind it
Makes use of bacterial nanopore proteins which drag DNA through small perforations in a chip ‘nanopore’
DNA is ‘sequenced’ as it is dragged through the nanopore
How does PacBio sequencing allow for observation of DNA modifications
Anytime a cytosine is methylated a different pulse is seen
This allows us to identify any points of DNA cytosine methylation
Explain how we identify base pairs using Oxford Nanopore sequencing
A flow of ions flows through the nanopore constantly
Each base blocks the current to a different degree
Each different nucleotide it blocks the ion current a very specific amount
Talk about nanopore proteins and explain how we use them
Nanopore proteins play a role in bacterial cells - normally they take up DNA
They transport DNA into the bacterial cell from outside the well
They take up double stranded DNA
Motor protein and motor enzyme transports DNA across the nanopore one strand at a time - forward strand first then the reverse strand -> can sequence both strands this way
What are two commercially available oxford nanopore sequencers?
MinION
GridION
Talk about some of the portable Oxford Nanopore technologies
Flongle and SmidgION
For use in the field
Often used in microbioogy e.g. for covid detection