Lecture 10: Genome sequencing technology Flashcards
What? First generation sequencing:
one sequence at a time
- eg. Sanger sequencing
WHAT? Second generation sequencing:
massively parallel sequencing of fragments of different sequences
eg. Illumina sequencing
WHAT? Third generation sequencing:
long read massively parallel sequencing
eg. Pacific Biosystems and Nanopore
The cost of sequencing has fallen 10,000x in past
decade - MOORE’S LAW
Moore’s Law: the
number of transistors on a chip doubles every two years while the costs are halved. …
SEQUENCE OF TECHNOLOGY DISCOVERY
2005:
- Automation of first generation sequencing,
‘Next generation sequencing’ and Pacific Biosciences
2007:
illumina
2008:
SOLID/454
2010:
Ion torrent
2015:
Nanopore
2022:
Ultima ($100 genome)
Cost per Genome from $100M to $100 (Moore’ Law)
slide 3
First generation sequencing: the Sanger method (1970’s on)
- what is required? What occurs?
-
- Based on ACTION of DNA POLYMERASE
- Requires TEMPLATE DNA
- DNA PRIMER
- POLYMERASE
- NUCLEOTIDES
- SMALL AMOUNT OF NUCLEOTIDE ANALOG included.
– The INCORPORATION OF THE ANALOGUE TERMINATES SYNTHESIS
- SMALL AMOUNT OF NUCLEOTIDE ANALOG included.
Historical note: the first human genome was sequenced with Sanger at great cost!
SANGER SEQUENCING REACTION:
What is it? What is the process? What is needed? - 5
1 * Chain-termination method
2 * Uses ‘dideoxy nucleotides’
3 * WhenADDED IN RIGHT AMOUNT,
the CHAIN IS TERMINATED EVERY TIME THAT BASES APPEARS IN TEMPLATE
4 * Need a reaction for each
base: A, T, C, and G
- EXAMPLE OF FIRST GEN. SEQUENCING
Sanger Sequencing reaction: EXAMPLE
deoxyribose - HO
dideoxyribose - H
- cannot form a bond with the next base
Template
3’ ATCGGTGCATAGCTTGT 5’
Sequence reaction products
5’ TAGCCACGTATCGAACA* 3’
5’ TAGCCACGTATCGAA* 3’
5’ TAGCCACGTATCGA* 3’
5’ TAGCCACGTA* 3’
5’ TAGCCA* 3’
5’ TA* 3’
SEQUENCE SEPARATION:
- Gel electrophoresis = 7
- TERMINATED chains need to be SEPARATED
-
- Requires ONE-BASE-PAIR RESOLUTION
- See difference between chain of ‘X and X+1 base pairs’
- Gel electrophoresis
5. * Very THIN GEL
6. * HIGH VOLTAGE
7.* WORKS WITH RADIOACTIVE OR FLUORESCENT LABELS
- Gel electrophoresis
figure on slide 6
Sanger Sequencing reaction:
‘Capillary electrophoresis’ (1998) - what is it and what does it possess? 4
- AUTOMATED SEQUENCERS used very THIN CAPILLARY TUBES
- USED FLUORESCENCE, NO RADIOACTIVITY
3 * Run all 4 FLUORESCENTLY TAGGED REACTIONS is SAME CAPILLARY
4 * Can have 384 CAPILLARIES RUNNING at the SAME TIME.
FIGURE ON SLIDE 7:
- Robotic arm and syringe
- load bar
- 96 glass capillaries
- 96-well plate
Sanger Sequencing reaction
How to SEQUENCE READING OF FLUORESCENTLY LABELED REACTIONS = 4
1.* Fluorescently labeled
reactions SCANNED BY LASER as a PARTICULAR POINT IS PASSED
- COLOUR PICKED UP by
DETECTOR
- COLOUR PICKED UP by
3 * OUTPUT sent DIRECTLY to COMPUTER
- NB. BIG INCREASE IN SEQUENCING EFFICIENCY AND DECREASE IN COST
- figure on slide 8
1. Dye-labeled dideoxynucleotides are used to generate DNA fragments of different lengths
- Graph
PROS = 2
CONS = 3
FOR SANGER SEQUENCING
- Cons
1. Requires MANY COPIES OF TEMPLATE (plasmid, or amplified PCR product)
- Requires a KNOWN SEQUENCE AT THE 5’ or 3’ END (to design a primer against)
- LIMITED LENGTH for each SEQUENCE RUN (usually max ***‘1kb’ sequenced
PROS
1. CHEAP
2. QUICK
Best applications for sanger: 2
- ‘SEQUENCE INSERTS’ contained WITHIN PLASMIDS AND AMPLICONS
- CHECK for SUCCESSFUL MUTAGENESIS OF KNOWN INSERT.
Second generation Sequencing Technologies
(early 2000’s on): 8
- Massively PARALLEL SEQUENCING OF DNA FRAGMENTS
2 * Many DIFFERENT STRATEGIES – MOST USE DNA POLYMERASE PRIMER EXTENSION (similar to Sanger)
- DIFFERENCES to sanger:
- TEMPLATE PREPARATION, 5. SEQUENCING CHEMISTRY,
- DETECTION OF NUCLEOTIDES
- ‘Illumina’ is the MAIN PLATFORM USED
- Many other platforms have been and gone, and new ones still emerging
Second generation Seq Workflow: 5
1 * Next gen sequencing
does NOT REQUIRE DNA TO BE ‘CLONED’
- can you use DNA or RNA–>DNA
2 * DNA is FRAGMENTED
3 * ADAPTERS are ADDED
TO EACH END
4 * PCR is used to make a
LIBRARY
- …SEQUENCE LIBRARY INSERT…READ ALIGNMENT/ASSEMBLY
5 * Massive parallel sequencing of the library
Illumina – massively parallel
sequencing in a flow cell… libraries? = 3
- DNA libraries are loaded
onto a ‘flow cell’
- DNA libraries are loaded
- Individual DNA molecules are dispersed
- These sequences are
amplified to form clusters
(each cluster contains
identical DNA)
- These sequences are
Illumina - Template Prep
(cluster generation) - process = 7
- Illumina/Solexa SOLID PHASE AMPLIFICATION
—- 2. ONE DNA MOLECULE PER CLUSTER - Sample preparation DNA (5 microg)
- TEMPLATE dNTPs and POLYMERASE
5 BRIDGE AMPLIFIFICATION
- 100-200 MILLION MOLECULAR CLUSTERS
- CLUSTER GROWTH
Figure on slide 13.
THE FULL PROCESS: Illumina sequencing: step by step = 7
- The sequencing occurs as SINGLE-NUCLEOTIDE ADDITION REACTIONS
because of a BLOCKING GROUP AT THE 3-OH POSITION OF RIBOSE SUGAR. - STEP 1 The nucleotide is added by polymerase,
- STEP 2 unincorporated nucleotides are washed away
- STEP 3 the flow cell is imaged to identify each cluster that is reporting a fluorescent signal
- STEP 4 the fluorescent groups are chemically cleaved
- STEP 5 the 3-OH is ‘chemically deblocked’
- This series of steps is repeated for up to ‘150 NUCLEOTIDE ADDITION REACTIONS’
Illumina – how the sequencing works with reversible terminators = 4
- INCORPORATE ALL 4 NUCLEOTIDES, EACH LABEL WITH A DIFFERENT DYE’
- WASH, 4 COLOUR IMAGING
- CLEAVE DYE AND TERMINATING GROUPS, WASH
- ….. REPEAT CYCLES
- LOOK AT FIGURE ON SLIDE 15 FOR PROPER PROCESS
Illumina – reversible terminators Detection
Imaging of fluorescent tags over cycles
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 —>CATCGT
CCCCCC
FIGURE ON SLIDE 16 SHOWS
ILLUMINA CAPACITY -
Different machines have different capacity flowcells = 4
- Different machines have different capacity flowcells.
- Simplest models can do ‘25 million per flowcell’
3 * Intermediate models up to ‘250 million per flowcell’
4 * High end ‘20 B reads’
Capacity, multiplexing:
Illumina Multiplexing: ‘What is Multiplexing? What is so special about Illumina?’ = 5
- Multiplexing: Combining different samples in the same flow cell
- You probably don’t need 250 million reads for 1 sample!
…3. Mix 10 samples and get 25 million reads for each
- You probably don’t need 250 million reads for 1 sample!
- Multiplexing involves adding a unique sequence ‘barcode’ in each library preparation.
….5. This marks each read to indicate which sample it belongs to.
- Multiplexing involves adding a unique sequence ‘barcode’ in each library preparation.
Single end vs paired end sequencing
WHAT IS SINGLE-ENDED SEQUENCING?
Each species of DNA is represented by one read
- Linear DNA sequenced towards flow cell
diagram on slide 18
Single end vs paired end sequencing
WHAT IS PAIRED-ENDED SEQUENCING?
Permits sequencing of each end of one species of DNA
- Arc shape into flow cell
- cut into seq1 and seq 2 (linear) in flow cell
‘Insert length will be equal to the length of the strand between site A1 and A2’
diagram on slide 19
Pros/cons of illumina
PROS = 3
CONS = 3
PRO:
1.Massively parallel sequencing
- Low error rate
- Multiplexing libraries
CONS:
1.Expensive to buy and run (partly due to lack of competitors)
- Limited sequence length
- Library preparation needed
Best applications for illumina = 3
- Genomic DNA sequencing of known organisms (eg. Human genomes)
2 * RNA-seq (ie human transcriptome)
3 * Small scale sequencing (miseq) to thousands of human genomes at once (Novaseq)
Second generation seq - Ion Torrent
‘pH change detection- Life Technologies Inc’
SET UP =
- Prepared Library
- Library loaded onto Ion Spheres
- Ion Semiconductor Sequencing Chip
….4. IONS SPHERES CAPTURED IN WELL ON SEMI-CONDUCTOR CHIP - Ion PGM
…6. Sequencing .. SEQUENCE LIBRARY by pH CHANGE DETECTION
Ion Torrent sequencing procedure = 4
- The 4 bases are flooded into the wells, ONE AT A TIME
- Polymerase Integrates a Nucleotide
3.If the base is incorporated, a pH change is recorded:
- Hydrogen and Pyrophosphate are RELEASED
DIAGRAM ON SLIDE 22
Third generation seq –
‘Pacific biosciences’
WHAT IS IT? WHAT DOES IT DO? = 6
- No need to use PCR to make a LIBRARY (single molecule sequencing)
- As with second generation, ADAPTERS ADDED TO EACH END of DNA fragments
- Specialising in LONG SEQUENCES (~5k long)
- A SINGLE LON G MOLECULE isCOPIED BY A POLYMERASE ANCHORED AT THE BOTTOM OF A WELL.
….5. EACH WELL has
ONE POLYMERASE within it.
- A SINGLE LON G MOLECULE isCOPIED BY A POLYMERASE ANCHORED AT THE BOTTOM OF A WELL.
- After a BASE IS ADDED a FLUOROPHORE IS CLEAVED OFF THE BASE, AND THAT IS DETECTED WITHIN THE WELL.
How does ‘Pac Bio’ get low sequence error rate? = 3
- Because of the unique nature of the ‘Bell end’
adapters…. - Each INDIVIDUAL MOLECULE gets sequenced an AVERAGE OF 30 TIMES…
- SO A CONSENSUS SEQUENCE CAN BE BUILT UP
How does Pac Bio get low sequence error rate? DIAGRAM = 5
- Double stranded DNA from ‘SMRT Bell cDNA Library’
- PacBio Sequencing
- Template DNA with DNA Polymerase
- LOW ACCURACY Raw Reads
- Ligh Consensus accuracy (>98%)
Pros/cons of PacBio
PROS = 2
CONS = 3
PROS
1. No Need to Make a Library (Single Molecule Reads)
- Generates Long Sequences
CONS
1. SLOW
- NOT WIDELY AVAILIABLE
- EXPENSIVE
Best applications for PacBio = 3
1 * Sequence DNA or RNA of organism with no known reference genome (‘de novo’ sequencing)
2 * Identify new alternative splice forms of transcripts
3 * Sequence transcribed pseudogenes (with only tiny sequence differences to parent gene)
Third generation seq - Oxford Nanopore
WHAT IS IT? WHAT DOES IT DO? = 6
- Worlds first MOBILE DNA SEQUENCER
2 * Plugs into your LAPTOP and runs off a USB
3 * Thousands of PROTEIN PORES IN THE MEMBRANE,
EACH WITH A HOLE IN THE MIDDLE
4 * SINGLE DNA molecules, WITH ADAPTERS at the
end, PASS THROUGH PORE IN MEMBRANE
- EACH NUCLEOTIDE SLIGHTLY DIFFERENT CHARGE
6 * CHARGE DETECTED AS NUCLEOTIDES PASS THROUGH MEMBRANE
Pros/cons of Nanopore
PROS = 3
CONS = 2
PROS
1. CHEAP
- QUICK
- PORTABLE
CONS
1. HIGH ERROR RATE
- LOW THROUGHPUT
Best applications for Nanopore = 3
1 * Detection of alternative spliced isoforms of transcripts
2 * Detect large structural rearrangements
3 * DNA sequencing ‘in the field
Current state of Sequencing technology = 3
1 * Illumina is used by most people, it is quick, accurate and easy. BUT it is expensive – the machines and the reagents.
2 * PacBIO is used for people who need very long reads. - This is basically for genomics of organisms with nothing close to a reference genome
- Oxford Nanopore/minION has great promise.
- It’s portability means it could bring DNA sequencing to the masses
- Oxford Nanopore/minION has great promise.