L3: Genome Sequencing Flashcards
What were the objectives of the genome project
- whole genome sequence
- establish an interface
- identity + annotate genes
- characterize DNA diversity
- more resources
Go through the evolution of DNA sequencing technologies
- 1977 Sanger sequencing by Fredrick
- Next Generation: Roche 2005, Illumina 2007
- Third Generation: from 2010 includes PacBio
What is sanger sequencing
- Use a single primer to make a single DNA strand
- ss molecules made from templates using dNTPs and randomly terminated by adding dideoxynucleotides (ddATP, ddTTP, ddCTP, ddGTP)
==> seperated by polyacrymide gels - the ddNTPs are labelled with fluro dyes
- sequenced with capillary sequencer
How does high throughput Next Gen sequencing work and PROS
Make millions of short sequences in single run ==> sequences can overlap to be 100K base pairs long
PROS
- almost complete genome but prcy
- mix populations for biodiversity measure
- can use RNA population for gene expression analysis
Illuminia Soleca
- producing single-stranded DNA
- ligate the adapter oligos to DNA fragments
- use microfluidic cluster to add fragments to the surface of a glass flow cell, each flow cell seperated into 8 lanes
- interior surface covalently attached to oligos
- complementary oligos are ligated into fragments
Compare Illumina vs 454 Sequencing
454
* can make longer read than illuminia cause can do multiple reads at once
lluminia
* DNA/RNA fragments are shorters
* adapters are added
* fragments amplified by PCR w/ adapter primers
What is third generation sequencing
- PACBIO SMRT (single molecule real-time)
- Illumina Tru Seq
- Oxford
- produces very long reads
How does Pac Bio work (easy then hard)
Generate amplicon => ligate adapters => sequence => data analysis
1. SMRT bells ligated to each amplicon
2. Sequencing primer annealed to SMRT bell template and polymerare bound to the complex
3. complex loaded into zero mode wavelengths to replicate and produces nucleotide specific fluroesnce
4. circular consensious allows poly to repeatedly replicate => one long read
What is nanoport sequences
- determine sequence of DNA fragments by passing DNA through protein pore in membrane
Shot gun sequencing
- Contifs are built by overlapping reads
- There are always gaps between contigs where software cannot extend anyfurther
- Always many reads left over that cannot be assigned to contig
Why: when reads contain parts of repetitive sequences, they may overlap thousands of other reads, making it impossible to uniquely determine overlaps. Consequently: Very few eukaryotic genomes are ever completely sequenced.
What are key DNA sequencing NGS steps
- fragment genomic DNA into small fragments of a few hundred bases
- immobilize individual DNA molecules onto a solid surface
- amplify each molecule by PCR many thousands of times
- perform DNA synthesis using nucleotides that emit a characteristic wavelength of light each time a base is added
- read the sequence by imaging the emission of light in real time
- from the thousands or millions reads, assemble overlapping reads into long contiguous segments known as “contigs