Next Gen Sequencing Flashcards
When did the human genome project work?
1990 - 2003
How many base pairs are in the human genome project?
3 billion base pairs long
Which form of sequencing was used in the human genome project?
Traditional Sanger Sequencing
How much did the human genome project cost?
3 billion dollars
What is PCR and why is it used?
It is fundamental for any DNA sequencing application.
PCR is used to amplify a specific region of DNA; primers flank the region to be amplified.
How does PCR work?
Each cycle doubles the amount of DNA copies of the target sequence. This amplifes enough DNA molecules so that we have sufficient material to sequence or for other applications.
Briefly explain sanger sequencing
- Invented by Fred Sanger in 1977.
- Cycle sequencing
- One reaction = one sequence
- Accurate (99.99%)
- Slow and low throughput
- Used predominantly until late 2000s
- Costly
Why is Next Gen sequencing a preferred way of sequencing?
- It matches the technological advances since the end of the human genome project.
- Decrease in the cost of DNA sequencing
- Since the end of 2007, the cost has dropped at a rate faster than that of Moore’s law
What is Next Generation Sequencing used for?
- It has replaced Sanger Sequencing for almost all sequencing tests in the lab
- Whole genome sequencing
- Whole exome sequencing
What are the four steps in next gen sequencing?
- DNA library construction
- Cluster generation
- Sequencing-by-synthesis
- Data analysis
What is step 1 - DNA library construction?
- In the wet lab, prepare the DNA sample for sequencing
- DNA is chopped into small fragments (typically 300bp). This is called shearing
- This can be achieved chemically, enzymatically or physically (sonication).
- Repair the end of the sheared DNA fragments by adding adenine (A) nucleotide overhangs
- Adapters with thymine overhangs can be ligated to the DNA fragments
- End result is the DNA library of literally billions of small, stable random fragments representative of our original DNA sample
What is shearing?
The process of chopping DNA into smaller fragments by chemicals, enzymes or physical process (sonication).
What is a DNA library?
A collection of random DNA fragments of a specific sample to be used for further study; for example, next gen sequencing.
Why are adapters important in step one of DNA library construction?
- Adapters contain the essential components to allow the library fragments to be sequenced
Give examples of adapters added to the sequence
- Sequencing primer binding sites
- P5 and P7 anchors for attachment of library fragments to the flow cell
What is step 2 - cluster generation?
- Hybridise DNA library fragments to the flow cell. This is a random process.
- This is needed to amplify the fragments to a bigger size that we can measure as a lot of it in the DNA library is too small.
- Perform bridge amplification to generate clusters
- Clusters are now big enough to be visualised and the flow cell is ready to be loaded onto the sequencing platform
What is step 3 - sequencing by synthesis?
- Modified 4 bases (ATCG) with chain terminators.
- Different fluorescent colour dye so each single nucleotide is sequenced 1 cycle at a time in a controlled manner.
- Single nucleotide incorporation (DNA polymerase); flowcell wash. Image the 4 bases (digital photograph).
- Cleave termination chemical group and dye with enzyme
- Camera sequentially images all 4 bases on the surface of the flow cell each cycle.
- Each cycle image is converted to the nucleotide base call (ACGT).
- Cycle number is anywhere between 50-600 nucleotide base pairs.
Compare the NGS vs sanger sequencing
- NGS produces a digital readout whereas sanger sequencing produces an analogue readout.
- Sanger is one sequence read whereas NGS is a consensus sequence of many reads
Why is exome sequencing preferred to genome sequencing?
- Only interested in the gene protein coding exons or ‘exome’ represents 1-2% of the genome.
- It is more efficient to only sequence these parts rather than the whole genome as it costs £1000 to do the entire genome, but only £200-300 for an exome.
- Target enrichment
- Capture target regions of interest with baits
- Potential to capture several Mb genomic regions of interest
- Exome would be 50 Mb in size
What percentage of pathogenic mutations are protein coding?
About 80%
What are the different exome data analysis techniques?
- Sequence Read Alignment
- Target Coverage Reporting
- Variant Annotation
- Variant Calling
Why is exome sequencing done?
- To collect disease affected individuals and their families
- Use of NGS in disease gene identification
- Perform exome sequencing
- Compare variant profiles of affected individuals
What is being looked for in exome sequencing?
The mutations in the protein coding in the exons
NGS is not just for DNA, What else can it be used for?
RNA sequencing using the total RNA (or mRNA) from a collection of cells or tissues.