Next Gen sequencing Flashcards
PCR
- Fundamental principle for any DNA sequencing application
- PCR is used to amplify a specific region of DNA; primers flank the region you want to amplify.
- Each cycle doubles the amount of DNA copies of your target sequence
- Amplify enough DNA molecules so that we have sufficient material to sequence or for other DNA applications
Sanger sequencing
- Invented by Fred Sanger in 1977
- Cycle Sequencing
- Based on PCR
- Modified nucleotides
Next Generation of DNA sequencing (NGS1)
- Technological advances since the end of the human genome project
- Decrease in the cost of DNA sequencing
- Since the end of 2007, the cost has dropped at a rate faster than that of Moore’s law
NGS 2
- Development of new NGS methods began 13 years ago with 454 pyrosequencing
- DNA sequencing throughput jumped 10 orders of magnitude
- Solexa sequencing-by-synthesis (SBS) developed end of 2005
- Sequencing market to this day is now dominated by Illumina SBS sequencing
NGS process
• Four step process
- DNA library Construction
- Cluster Generation
- Sequencing-by-synthesis
- Data analysis
Step 1 DNA library construction
- In the wet lab – first we need to prepare the DNA sample for sequencing
- Essentially the DNA is chopped into small fragments (typically 300bp ). This is called shearing
- This can be achieved chemically, enzymatically or physically (sonication)
What is a DNA library?
A DNA library is a collection of random DNA fragments of a specific sample to be used for further study; in our case next generation sequencing
The DNA can come from just about anywhere, but in human genetic research generally it’s derived from patients blood.
Part 2 of DNA library
- We have to repair the end of the sheared DNA fragments
- Adenine (A) nucleotide overhangs are added to end of fragments
- Adapters with Thymine (T) overhangs can be ligated to the DNA fragments
- The end result is the DNA library of literally billions of small, stable random fragments representative of our original DNA sample
Part 3 of DNA librarys
- Adapters contain the essential components to allow the library fragments to be sequenced
- Sequencing Primer binding sites
- P5 and P7 anchors for attachment of library fragments to the flow cell
Step 2 - Cluster Generation 1
- Hybridise DNA library fragments to the flowcell
- Hybridization to the flowcell is a Random process
- But we can’t measure individual single molecules of our DNA library –too small
- We need to amplify the fragments to a bigger size that we can measure
Part 2 - Cluster Generation 2
- Perform bridge amplification to generate clusters
- Many billions of clusters originating from single DNA library molecules
- Clusters are now big enough to be visualised
- Flow cell is now ready to be loaded on to the sequencing platform to perform the sequencing
Step 3 - Sequencing by synthesis
• Modified 4 bases (ATCG) with:
Chain terminators
Different fluorescent colour dye
• Sequence each single nucleotide 1 cycle at a time in a controlled manner
Part 2 - Sequencing by synthesis
- Single nucleotide incorporation (DNA polymerase)
- Flowcell wash
- Image the 4 bases (digital photograph)
- Cleave terminator chemical group and dye with enzyme
Part 3 - Sequencing by synthesis
- Camera sequentially images all 4 bases on the surface of the flowcell each cycle
- Each cycle image is converted to a nucleotide base call (ACGT)
- Cycle number anywhere between 50 – 600 nucleotide base pairs
Part 4 - Sequencing by synthesis
Machine DNA base calls
Millions of short-read sequences representing our original DNA library
Analysis of NGS data
- Short read sequences from the sequencing machine need to be re-assembled like a jigsaw
- Mapping locations of our sequence reads on the reference genome sequence
- To generate a consensus sequence of our original DNA sample library
- In comparing this consensus sequence against the human genome reference and look for the genetic variants
- Dedicated software and bioinformatics tools will achieve this
NGS v Sanger sequencing
- NGS (left) produces a digital readout. Sanger (right) produces an analogue readout
- Sanger is one sequence read
- NGS is a consensus sequence of many reads
Whole-exome sequencing part 1
- There are ~21,000 genes in the human genome
- Often, we are only interested in the gene protein-coding exons or ‘exome’ represents 1-2% of the genome
- Some ~80% pathogenic mutations are protein-coding
- More efficient to only sequence the bits we are interested in, rather than the entire genome
- Costs £1,000 for a genome, but only £200-£300 for an exome
Whole-exome sequencing part 2
- Target enrichment
- Capture target regions of interest with baits
- Potential to capture several Mb genomic regions of interest
- Exome would be 50Mb in size
Application of exome Sequencing
- Collecting disease affected individuals and their families
- Use of NGS in disease gene identification
- Perform exome sequencing
- Compare variant profiles of affected individuals
- Try to identify the variant or mutation shared buy the affected individuals
RNA sequence 1
- NGS not just for studying DNA.. RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue
- RNA is first converted to cDNA prior to library construction
- NGS of RNA samples determine which genes are actively expressed.
- Single experiment can capture the expression levels of thousands of genes
RNA sequence 2
- The number of sequencing reads produced from each gene can be used as a measure of gene abundance
- Quantification of the expression levels
- Calculation of the differences in gene expression of all genes in the experimental conditions
- With appropriate analysis, RNA-seq can be used to discover distinct isoforms of genes are differentially regulated and expressed