Next Generation Sequencing Flashcards
Describe the human genome project:
How long is it?
How is it done?
Cost?
Nowadays?
- Human Genome Project (1990 - 2003)
- 3 billion base pairs long
- All done with traditional Sanger Sequencing
- Unravelled the first Human Genome Sequence to drive genetics research
- 3 billion dollars cost
- We can now achieve this amount of sequencing in as little time as one day!
What is the fundamental principle of DNA sequencing?
What does PCR achieve?
How many copies does it produce?
When is enough produced?
- Fundamental principle for any DNA sequencing application
- PCR is used to amplify a specific region of DNA; primers flank the region you want to amplify.
- Each cycle doubles the amount of DNA copies of your target sequence
- Amplify enough DNA molecules so that we have sufficient material to sequence or for other DNA applications
Describe Sanger Sequencing
• Invented by Fred Sanger in 1977 • Cycle Sequencing • Based on PCR • Modified nucleotides o Chain Terminators o Nucleotide specific colour tag • A small proportion of the free nucleotides are modified this way to allow every base in the sequence to be read • One reaction = one sequence • Up to 800 bp per reaction • Accurate (99.99%), Slow and low-throughput • Used predominantly until late 2000s • Costly ££££
- Identify single nucleotide polymorphisms (SNPs), or mutations
- We can identify monogenic disease-causing mutations
- Usually for single gene tests
- E.g. CTFR in cystic fibrosis
What is the benefits of Next Generation of DNA Sequencing (NGS)?
- Technological advances since the end of the human genome project
- Decrease in the cost of DNA sequencing
- Since the end of 2007, the cost has dropped at a rate faster than that of Moore’s law
Describe the history of Next Generation of DNA Sequencing (NGS) II
- Development of new NGS methods began 13 years ago with 454 pyrosequencing
- DNA sequencing throughput jumped 10 orders of magnitude
- Solexa sequencing-by-synthesis (SBS) developed end of 2005
- Sequencing market to this day is now dominated by Illumina SBS sequencing
- Next Generation Sequencing has replaced Sanger sequencing for almost all sequencing tests in the lab:
Whole genome sequencing
Whole exome sequencing
What are the four steps in NGS Sequencing?
Four step process
- DNA library Construction
- Cluster Generation
- Sequencing-by-synthesis
- Data analysis
What occurs in the first step: DNA library Construction 1?
What is a DNA library?
Where is it normally derived from?
What occurs in this stage?
A DNA library is a collection of random DNA fragments of a specific sample to be used for further study; in our case next generation sequencing
The DNA can come from just about anywhere, but in human genetic research generally it’s derived from patients blood.
- In the wet lab – first we need to prepare the DNA sample for sequencing
- Essentially the DNA is chopped into small fragments (typically 300bp ). This is called shearing
- This can be achieved chemically, enzymatically or physically (sonication)
What occurs in the first step: DNA library Construction 2?
- We have to repair the end of the sheared DNA fragments
- Adenine (A) nucleotide overhangs are added to end of fragments
- Adapters with Thymine (T) overhangs can be ligated to the DNA fragments
- The end result is the DNA library of literally billions of small, stable random fragments representative of our original DNA sample
What occurs in the first step: DNA library Construction 3?
- Adapters contain the essential components to allow the library fragments to be sequenced
- Sequencing Primer binding sites
- P5 and P7 anchors for attachment of library fragments to the flow cell
What occurs in the second step: Cluster Generation 1?
Step 2: cluster generation
- Hybridise DNA library fragments to the flowcell
- Hybridization to the flowcell is a Random process
- But we can’t measure individual single molecules of our DNA library –too small
- We need to amplify the fragments to a bigger size that we can measure
Cluster Generation II
• Perform bridge amplification to generate clusters
• Many billions of clusters originating from single DNA library molecules
• Clusters are now big enough to be visualised
• Flow cell is now ready to be loaded on to the sequencing platform to perform the sequencing
How do we sequence the library?
DNA libraries deposited on flowcell
-> bridge amplification
-> amplified to form ‘clusters’
• Sequencing machine processes a flowcell containing lanes
• Each lane may contain multiple samples (indexed with a DNA barcode contained in adapters)
What occurs in Step 3: sequencing-by-synthesis
• Modified 4 bases (ATCG) with:
Chain terminators
Different fluorescent colour dye
• Sequence each single nucleotide 1 cycle at a time in a controlled manner
Sequencing-By-Synthesis II
• Single nucleotide incorporation (DNA polymerase)
• Flowcell wash
• Image the 4 bases (digital photograph)
• Cleave terminator chemical group and dye with enzyme
• REPEATED N NUMBER OF TIMES
Sequencing-By-Synthesis III
• Camera sequentially images all 4 bases on the surface of the flowcell each cycle
• Each cycle image is converted to a nucleotide base call (ACGT)
• Cycle number anywhere between 50 – 600 nucleotide base pairs
How do we analyse NGS Data
- Short read sequences from the sequencing machine need to be re-assembled like a jigsaw
- Mapping locations of our sequence reads on the reference genome sequence
- To generate a consensus sequence of our original DNA sample library
- In comparing this consensus sequence against the human genome reference and look for the genetic variants
- Dedicated software and bioinformatics tools will achieve this
Compare NGS v Sanger Sequencing
- NGS (left) produces a digital readout. Sanger (right) produces an analogue readout
- Sanger is one sequence read
- NGS is a consensus sequence of many reads
What occurs in target enrichment?
- Target enrichment
- Capture target regions of interest with baits
- Potential to capture several Mb genomic regions of interest
- Exome would be 50Mb in size