DNA Sequencing Flashcards
Minimum requirements for DNA synthesis in vitro
- most methods of DNA sequencing are based on DNA synthesis
- DNA synthesis proceeds in 3’ direction
Formation of phosphodiester bond
- catalyzed by DNA polymerase
- nucleophilic attack on the 3’-OH on the innermost phosphorous atom of the incoming dNTP
- dideoxyriboucleoside triphosphate (ddNTP) terminates DNA synthesis (because the 3’-OH is non-existent)
- when dd(G)TP is incorporated instead of dGTP there is no further extension of the strand
- this gives DNA daughter strands of varying length
Spiking DNA
- could spike the DNA polymerization cocktail with small amounts of ddATP, ddCTP, ddTTP, ddGTP
- in this case we would get a subset of DNA elongation products terminating with a ddNTP base at every position in the DNA sequence
- to keep track of which bases are terminating we attach different fluorescent colours to each type of ddNTP so that we can see which colour/ddNTP comes next in sequence
- to sort the fragments by size (to identify correct order) we use gel electrophoresis
Fluorescent dideoxy sequencing
- usually automated
- gel electrophoresis uses denaturing polyacrylamide gel (contains urea) to separate fragments by size
- this type of gel gives very fine resolution, ability to distinguish fragments that differ by 1 base in size
- as ddNTP-terminated fragments migrate in the gel, they pass a laser beam that excites the fluorescent dyes and a CCD camera that records the flash of coloured light that results
- software converts raw data to electropherogram and DNA sequence
Before fluorescent sequencing…
- before fluorescent sequencing technology was invented, radioactive labeling was used to detect bands on DNA sequencing gels
- 4 separate sequencing reactions, each containing a different ddNTP
- primers labelled radioactively
- x-ray film was used to prepare autoradiograph
Sanger dideoxy sequencing pros
- very accurate (low rate of sequencing error)
- relatively long sequencing reads (up to 1000b but 650b more common)
- easy and can be automated
- low cost (for small number of samples)
Sanger dideoxy sequencing cons
- too slow for many applications, such as genome sequencing
- costly when scaled up to acquire lots of data
- requires purification and preparation of each individual DNA sequence that is being studied
- these limitations led to invention of next generation methods
Sequencing capacity then and now
-Human genome project consortium in 2000: 8.64x10^7 bases per day
- Marine gene probe lab dalhousie in 2015:
1. 5x10^10 bases per day
-human genome project completed using dideoxy sequencing and took 10 years and 3billion$
Pyrosequencing
- developed by Pal Nyren and Mostafa Ronaghi at Royal institute of technology in 1996
- now commonly known as 454 sequencing
- depends on detection of pyrophosphates when dNTPs are added to growing DNA chain
Pyrosequencing step 1
- a sequencing primer hybridized to a single stranded DNA
- strand extension occurs in presence of DNA polymerase, ATP sulfurylase, luciferase, and apyrase, adenosine 5’ phosphosulfate (APS) and luciferin
Pyrosequencing step 2
- 1st dNTP is added to the reaction
- DNA polymerase catalyzes incorporation of dNTP into growing DNA strand IF it is complementary to the base in the DNA template strand
- each incorporation even = release of pyrophosphate (PPi) in a quantity equimolar to the amount of the incorporated nucleotide
Pyrosequencing step 3
- ATP sulfurylase converts PPi + APS —> ATP
- ATP + luciferase + luciferin —> oxyluciferin + visible light (in proportion to amount of ATP)
- light is detected by a camera and seen as a peak in the raw data output (pyrogram)
- height of each peak is proportional to the number of nucleotides incorporated
Pyrosequencing step 4
- Apyrase degrades unincorporated nucleotides and ATP
- when degradation is complete another nucleotide is added
Pyrosequencing step 5
- addition of dNTPs performed sequentially
- as the process continues, the complementary DNA strand is built up and the nucleotide sequence is determine from the signal peaks i the pyrogram trace
Semiconductor sequencing
- Ion torrent
- similar to pyrosequencing but instead of pyrophosphate, H+ ion is detected
- like pyrosequencing, begins with emulsion PCR
- DNA templates are on microscopic beads
- sequencing occurs on a modified computer chip
- hydrogen ions are detected in a layer of chip below bead wells (worlds smallest pH meter)
- no modified chemistry
- no camera needed
- very fast (a couple of hours)
Illumina DNA sequencing
- Prepare genomic DNA sample
- randomly fragment genomic DNA and legate adapters to both ends of the fragments
- Attach DNA to surface
- bind single stranded fragments randomly to inside surface of the flow cell channels
- Bridge amplification
- add unlabelled nucleotides and enzyme to initially solid-phase bridge amplification
- Fragments become double stranded
- the enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate
- Denature the double stranded molecules
- leaving single stranded templates anchored to the substrate
- Complete amplification
- several million dense clusters of double stranded DNA are generated in each channel of the flow cell
Illumina DNA sequencing Part 2
- Determine first base
- to initiate the sequencing cycle, add all four labeled reversible terminators, primers, and DNA polymerase enzyme to the flow cell
- Image first base
- after laser excitation, capture the image of emitted fluorescence from each cluster on the flow cell. Record the identity of the first base of each cluster
- Determine 2nd base
- repeat step 7
- Image 2nd chemistry cycle
- repeat step 8 - Sequence reads over multiple chemistry cycles
- repeat cycles of sequencing to determine the sequence of bases in a given fragments a single base at a time - Align data
- align data and compare to a reference and identify sequence differences
Performance/cost comparisons of sequencing instruments
Most expensive per MB:
- dideoxy ($2308)
- 454 FLX titanium ($12) - pyrosequencing
- Ion torrent ($7)
- Illumina MiSeq ($0.20)
- Illumina HiSeq 1000 ($0.04)
3rd gen: nanopore sequencing
- single molecule at a time (no pre amplification by PCR)
- enzyme unwinds DNA; a single strand is pulled by an electrical current through a pore in a membrane
- each base produced a characteristic disturbance in electrical current which can be used to read the bases as it travels through the pore
Nanopore pros and cons
- pros:
- long reads (up to 100kb)
- no PCR step
- small, highly portable DNA sequencer connects to USB port
- can be used in the field
- cons:
- low accuracy compared to other methods but getting better
3rd gen: PacBio
- older single molecule DNA sequencing method than Nanopore
- reads lengths 20-60kn
- not as accurate as illumina or ion torrent
- often used in combo with illumina when sequencing genomes