W3 - Next Generation Sequencing Flashcards
What is Next Generation Sequencing?
Technological advances since the end of the human genome project
* Decrease in the cost of DNA sequencing
* Since the end of 2007, the cost has dropped at a rate faster than that of Moore’s law
* Development of new NGS methods began 13 years ago with 454 pyrosequencing
* DNA sequencing throughput jumped 10 orders of magnitude
* Solexa sequencing-by-synthesis (SBS) developed end of 2005
* Sequencing market to this day is now dominated by Illumina SBS sequencing
* Next Generation Sequencing has replaced Sanger sequencing for almost all sequencing tests in the lab
- Whole genome sequencing
- Whole exome sequencing
What is NGS sequencing?
Four step process
1. DNA library Construction
2. Cluster Generation
3. Sequencing-by-synthesis
4. Data analysis
Step 1 - What is the DNA library construction?
- In the wet lab – first we need to prepare the DNA sample for sequencing
- Essentially the DNA is chopped into small fragments (typically 300bp ). This is called shearing
- This can be achieved chemically, enzymatically or physically (sonication)
- We have to repair the end of the sheared DNA fragments
- Adenine (A) nucleotide overhangs are added to end of fragments
- Adapters with Thymine (T) overhangs can be ligated to the DNA fragments
- The end result is the DNA library of literally billions of small, stable random fragments representative of our original DNA sample Adapters contain the essential components to allow the library fragments to be sequenced
- Sequencing Primer binding sites
- P5 and P7 anchors for attachment of library fragments to the flow cell
Step 2 - What is Cluster Generation?
- Hybridise DNA library fragments to the flowcell
- But we can’t visualise individual single molecules of our DNA library –too small..!
- We need to amplify the fragments to a bigger size for a stronger signal
This is now all happening on the surface of the flow cell. - Perform bridge amplification to generate clusters
- Many billions of clusters originating from single DNA library molecules
- Clusters are now big enough to be visualised
- Flow cell is now ready to be loaded on to the sequencing platform to perform the sequencing
Step 3 - How does Sequencing By Synthesis work>
- Modified 4 bases (ATCG) with:
Chain terminators
Different fluorescent colour dye - Sequence each single nucleotide 1
cycle at a time in a controlled
manner - Single nucleotide incorporation (DNA polymerase)
- Flowcell wash
- Image the 4 bases (digital photograph)
- Cleave terminator chemical group and dye with enzyme
[ All of these 4 steps are repeated (n) times for full length sequence] - Camera sequentially images all 4 bases on the surface of the flowcell each cycle
- Each cycle image is converted to a nucleotide base call (ACGT)
- Cycle number anywhere between
50 – 600 nucleotide base pairs
Step 4 - What is the Analysis of NGS data?
- Short read sequences from the sequencing machine need to be pieced together like a jigsaw
- Mapping locations of our sequence reads on the reference genome sequence
- To generate a consensus sequence of our original DNA sample library
- In comparing this consensus sequence against the human genome reference and look for the genetic variants
- Dedicated software and bioinformatics tools will achieve this
What is the difference between NGS and Sanger Sequencing?
NGS (left) produces a digital readout. Sanger (right) produces an analogue readout
* Sanger is one sequence read
* NGS is a consensus sequence of many reads
What is the whole exome sequencing?
- There are ~21,000 genes in the human genome
- Often, we are only interested in the gene protein coding exons or ‘exome’ represents 1-2% of the genome
- Some ~80% pathogenic mutations are protein coding
- More efficient to only sequence the bits we are interested in, rather than the entire genome
- Costs £1,000 for a genome, but only £200-£300 for an exome* Target enrichment
- Capture target regions of interest with baits
- Potential to capture several Mb genomic regions of interest
- Exome would be 50Mb in size
How is the exome data analysed?
- We are looking for protein coding mutations in the exons
- Patient DNA sample subjected to exome sequencing
- Example on the right shows a snippet of the consensus sequence of that sequenced sample
- Reveals a heterozygous mutation in the CFTR gene
What are the applications of the Exome Sequencing?
- Collecting disease affected individuals and their families
- Use of NGS in disease gene identification
- Perform exome sequencing
- Compare variant profiles of affected individuals
- Try to identify the variant or mutation shared buy the affected individuals
What is RNA-seq?
- NGS not just for studying DNA.. RNA-seq experiments use the total RNA (or mRNA) from a collection of cells or tissue
- RNA is first converted to cDNA prior to library construction
- NGS of RNA samples determine which genes are actively expressed.
- Single experiment can capture the expression levels of thousands of genes
- The number of sequencing reads produced from each gene can be used as a measure of gene abundance
- Quantification of the expression levels
- Calculation of the differences in gene expression of all genes in the experimental conditions
- With appropriate analysis, RNA-seq can be used to discover distinct isoforms of genes are differentially regulated and expressed