1. Sequencing intro Flashcards
What is the biggest currently running genome project in UK?
Darwin Tree of Life (part of Earth BioGenome Project) - aim to seqeunce all living species
- so far 500 species - mostly insects because smaller genomes - easier to sequence
What are the uses of DNA sequencing?
Why sequence DNA:
- detect new species
- genotype individuals
- identify the presence of organisms (ex take air / water samples - detect organisms from the found DNA)
- determine epigenetic patterns - gene expression regulation
- determine gene expression patterns
What are the classification on sequencing based on sequence length? What methods are used for each?
Short-read:
- Illumina 150-300 bp both ways
Long-read:
- Sanger sequencing 1000 bp
- Oxford nanopore technology (ONT)
- Pacific Biosciences (PacBio)
Explain the classical sequencing mechanism
Sanger sequencing:
- based on synthesis + base-specific termination
- ssDNA sequence - adding specific primer - must know the sequence for the primer
- adding radioactively / ** fluorescently** labelled bases - termination of synthesis by ddNTPs (ddATP/ddGTP/ddCTP/ddTTP) - **lack 3’ OH **- no further nucleotide can be added
- for random integration 99% dNTPs + 1% ddNTPs of the specific base A/T/C/G - produces different length fragments - figure out position of the ddNTP
What is manual and automated Sanger sequencing
Manual: radioactively labelled ddNTPs - manually figure out the sequence of bases by travelled fragment distance
Automated: fluorescentlly labelled ddNTPs - use detector to record fluoresence at each fragment - sequencing chromatogram (peaks of each base)
Compare all sequencing technologies based on read lengths and error rates
- Illumina: short read, low error rate - uses universal adaptor (no primer)
- Sanger: medium length reads, low error rate, requires a sequence-specific primer
- ONT: very long reads, high error rate + minion portable sequencer, uses universal adaptor (no primer)
- PacBio: long reads, low error rate (because of HiFi), uses universal adaptor (no primer)
What measure evaluates error rates in sequencing?
Q value - Phred quality score
Explain Illumina sequencing
Illumina NGS:
1) Sample preparation: generating DNA library by sonication (DNA fragmented)
2) Cluster generation: ligation to 2 adaptors - ‘bridge amplification’ (cluster amplification) - when enough bridges - denaturation of one strand => high density clusters
3) Sequencing by synthesis: sequencing using dNTPS (dATP, dGTP, dTTP, dCTP) + reversible 3’ - universal primer annealed - DNA pol - sequening of all sites started at once - imaging records fluorescent colour at each position - after imaging dye cleaved => cycle repeated many times for all bases to be sequenced
4) Data analysis: overlapping reads aligned - data anaysed
How are next generation sequencing (NGS) technologies different to original DNA sequencing methods?
Different NGS technologies compared to original methods:
- Sequence DNA directly
- DNA cut into small fragments ~200 bp (ex by sonication)
- DNA fragments immobilised into solid support - DNA molecules physically separated
Describe the physical platform used in Illumina NGS
Illumina NGS uses a glass flowcell - short ss oligonucleotides adaptors (P5, P7) bound to surface or nanowells - dense lawn formed for adaptors (ligated to sequences) to bind to their OH end
The bound oligonucletides will act as primers for DNA polymerysation - bound sequence with adaptors acts as a template strand
Explain the process of sonication
Sonication: using high-frequency sound waves to fragment DNA sequence into smaller pieces
Explain cluster generation in Illumina NGS
When sonicated DNA added:
1) sonicated DNA fragments with ligated adaptors bind to embedded oligonucleotides
2) density of attached DNA adjusted - single DNA molecule at a separated well
3) Initial extension: DNA pol adds dNTPS to make ds DNA from 3’ end - oligonucleotides P5 and P7 act as primers - sonicated DNA as template strand)
4) Denaturation performed - original sonicated DNA washed off - ss copy left
5) Cluster generation: renaturation conditions created - non-bound adaptor bind to another embedded oligonucleotide - bridge formed - DNA pol - another round of DNA synthesis = bridge amplification
=> at each step two strands separated to act as templates for next strand synthesis
Steps 3) -5) repeated x35 times to create an identical sequence cluster in close proximity
Explain sequencing part in Illumina NGS
Illumina sequencing (sequencing of all DNA fragments at once):
- universal sequencing primer annealed to adapter sequences
- DNA pol uses dNTPs with different fluorescent groups: dATP, dGTP, dTTP, dCTP + 3’ reversible block
- incorporation of fluorescent dNTP + temporarily blocks - detector reads fluorescence at each DNA fragment
- the fluore + block removed - new 3’ OH open for next polymerization step - next fluor and block = repeated in cycles until all fragment recorded (leaves the nt but fluore+block removed)
Explain adapter ligation to sample DNA fragments in Illumina
Adapter ligation: adapters ligated at both ends of DNA fragment - different on each end -> on glass flowcell adaptors bind (base pair) to oligonucelotides P5 and P7 - which act as primers for DNA polymerization (ds to the bound ss sequence)
What is the difference between the primers and oligonucleotides bound to glass flowcells in Illumina?
Primers: bind to sonicated DNA sequences - allow binding to oligos embedded on glass flowcells
Oligonucleotides (P5 and P7): embedded in glass flowcells (the surface) - after binding act as primers for ds DNA synthesis
Why is the density adjusted for only one DNA molecule to bind to glass flowcell well in Illumina?
When sonicated DNA added - density of attached DNA adjusted - single DNA molecule at separated wells => because in the end equal replicates of each sequence needed - ‘bridge amplification’ will amplify each sequence the needed # for fluorescence to be detectable
How are cluster prepared for sequencing in Illumina?
Cluster preparation for sequencing - before sequencing ds DNA -> ss DNA:
- ddNTPS added to block further binding to oligonucleotides
- at oligo P5 bridge cleaved - one strand removed by denaturation
- a sequencing primer bound
What is used to separate different DNA clusters on glass flowcell in Illumina?
Distance - identical sequence clusters are separated by distance on the glass flowcell
Explain the structure of the glass flowcell in Illumina
What are the main differences between Sanger and Illumina sequencing?
- Sanger uses blocking terminator ddNTPs while Illumina uses reversible terminator ddNTPs (dye cleaved off after reading - 3’ OH end free for next ddNTP to be added)
- Sanger needs sequence specific primer - Illumina uses universal adaptors as primers
Explain PacBio sequencing method
PacBio uses single molecule long read (SMLR) and SEQUEL II/Revio technologies:
- uses fluorescently labelled dNTPs - when dNTP added by fixed polymerase - fluorescence released
- colour of fluorescence read - added base determined -> real-time monitoring of nucleotide incorporation (raw data a movie - not pictures)
- high error rate - fixed with HiFi libraries
Explain what are HiFi libraries in PacBio sequencing method
HiFi produces many reads of the same fragment - more reliable data generated - higher accuracy for long reads
Explain ONT sequencing technology
ONT technology:
- uses a synthetic membrane with very small protein pores - only DNA fits no proteins
- When DNA goes through the pore - change in current flow - current deviations can be converted into base data by modelling the current using neural net computing - diff base changes the current uniquely
What are the two sequencers offered by ONT?
- minION: portable (in fields) - 500 pores - 500 molecules read
- promethION: stationary (in labs) - 3000 pores - 3000 molecules read simultaneously