Lecture 07: DNA sequencing Flashcards
Human genome facts
Human genome length (nucleotides)?
- 3.6 Gb
Human genome length (metres)? ~2 m
Human genome mass?
- 0.000000000003 g
What can be sequenced?
- Whole genome (de novo) sequencing (Re-sequencing)
- Targeted (SNP, RAD, exome)
- Single individual vs.
- Multiple individuals (poolseq)
- Multiple taxa (metabarcoding, metagenomics)
- RNA → DNA (transcriptome)
Exome: composed of all the exons that remain after splicing -> all sections that potentially code for proteins
De-novo sequencing - vocab
- Reads = original sequences, pc compares them and then organises them after similarities -> the longer the better for analysis
- Contigs are just multiple reads strung together
- Scaffold: how far the distances of the sequences are from each other
Notable acheivments
- 2008 first human genome sequencing through parallel DNA sequencing
- 2010, 185 low coverage human genomes, 697 exomes
- 2010 first Pleistocene human genome
- 2014, 48 bird genome assemblies
-> Conclusion: learn bioinformatics & programming because the data sets get bigger and bigger
Technology timeline
- 1975: Sanger sequencing
- Automated Sanger sequencing
- 2005: “Next generation” Sequencing (NGS)
Sanger Sequencing
- first DNA sequencing method and used for 30 years
- Like most sequencing methods, it is template based
- start with a single strand of DNA, which is produced by using a single primer
- You set up four “sequencing reactions” wich contains:
- DNA template
- Primers
- Nucleotides
- A small proportion of one 32P-labelled dideoxy nucleotide (A,T,G or C)
- The di-deoxy nucleotides stop extension of the DNA chain
- different chains will be different lengths -> they can be separated by gel electrophoresis
- Separate your four reactions on four lanes of a large gel
(Later 4 different fluorophores were used meaning you could run 1 sample per lane)
Pre- versus post-NGS
- Sanger sequencing:
384 reads up to ~300,000 bp - Roche 454 sequencing (2005) :
300,000 reads up to 20,000,000 bp
Current sequencing technologies: Illumina
- Sample prep
- Bind DNA to flowcell, generate clusters
- Sequencing by synthesis
- Data analysis
Illumina sequencing more detailed
- Ligation of adapters to each end of the DNA molecule
- Single strands are coupled to glass slides, via adaptors
- bridge amplification, “PCR colonies” or “polonies”/cluster
- For subsequent sequencing, nucleotides are blocked, so no more than one can be incorporated per cycle
- Four fluorescent dyes for each base allow detection via pictures
Illumina- Considerations
- cluster density: Under-Clustered, Optimal clustered, Over- clustered
-
read lenght and qualtity: * High throughput
* High sequencing quality
* Limited read length (to some extent – up to 2 x 300 bp now possible) - Assembly is a problem
-> lowest cost per base, but full run cost $10.000
Advantages: high throughput and high sequencing quality, relatively cheap
Disadvantages: limited read length, quality declines with higher read lengths
Hi-C sequencing
- Based on Illumina sequencing
- Uses chromatin conformation information
- Allows better scaffolding
Example: Chinese mitten crab
Newer technologies
- Pac Bio
- Oxford Nanopore
- (Bionano)
Primary focus: increase read length →Improved genome assemblies
Pacific Biosystems (PacBio)
- designing a library: circular template by ligating adapters on dsDNA
- add primer and polymerase to the sample
- SMRT- Cell with Zero Mode- Waveguides
- each sample in one ZMW
- with every labeld nucteotide incorporated by Pol. -> light is emitted
Real time sequencing
Pacific Bioscience
- Strictly, single molecule reaction monitoring
- No washing: cheap on reagents
- No stop-and-go synthesis as in other systems
- Recent upgrade to 8x more reads per run
- Read length is up to ~40,000 bases
- Initially high error rate
- Highly competitive for **long reads
Pacific Bioscience HiFi
- Strictly, single molecule reaction monitoring
- No washing: cheap on reagents
- No stop-and-go synthesis as in other systems
- Repeated sequencing of circularized molecule
- Read length is “only” ~15,000 bases on average
- Very high accuracy
- Excellent for de-novo assembly