Sanger And Next Generation Sequencing Flashcards
How does Sanger sequencing exploit the structure of DNA?
Sanger sequencing exploits the phosphodiester bond between the 3’ and 5’ carbons.
In normal DNA binding, the formation of the phosphodiester bond requires a hydroxyl group to be present in the 3’ carbon to react with the phosphate on the 5’ of the adjacent sugar.
If you add a didiooxyribonucleotide (ddNTP) which just has a H rather than the OH, upon binding no further nucleotides can add and the chain terminates.
How is Sanger Sequencing performed?
- First step in sample prep is to purify sample (eg PCR sample or
plasmid sample), cleaning it up and removing excess disregular
ddNTPs and contaminated DNA. - Amplify sample of interest with ddNPTs and then run PCR cycle.
Another cycle of clean up and remove excess ddNPTs. Roughly 20-30
cycles. - Load sample onto automated capillary gel electrophoresis sequencer.
Gives different length nucleotide depending where ddNTP was
incorporated. Each ddNTP is labelled with a fluorophore, corresponding
to the different bases. - The fragments are detected after excitation of the labelled ddNTPs by a
laser - Computer analysis base calling
- Chromatogram showing each flourescent peak and base
corresponding to each amino acid. Allows you to analyse.
What are the the components of the reaction mixture?
Target DNA template (3’ -5’) Primer that is complementary to template (5’ - 3’) Taq polymerase Mixture of dNTPs and ddNTPs MgCl2 buffer
ddNTP have diff fluorophores depending on the sequence.
Termination product will be a 5’-3’ complementary sequence to the original template strand
What are phred scores?
Sequencing machine creates a text file containing the base sequence and the quality.
The quality is defined by the Phred score, examines the peaks and states the probability of that base being called wrongly.
Want a score of >20.
What problems may you incur with Sanger sequencing?
Weak profiles may be caused by:
- the presence of lots of different microbes - the plasmid haven’t be
isolated correctly so there are lots of competing templates.
- not adequate quantification of DNA or not high enough quality.
- an inefficient primer so the sequence reaction isn’t occurring well.
Presence of a second sequence that confuses the trace:
- due to a contamination
- poor quality primers
- residual primers from the original mix
Good sequence dropping to a rubbish one:
- GC rich genomes are more liable to flip on itself and cause a hairpin
loop so the polymerase can’t get in there to access this section of the
sequence.
- contaminants that effect sequencing reaction (addition of DMSO will
relax the secondary structure)
What are the guidelines for Sanger sequencing?
Primers should be checked for SNPs
Raw data should be stored
Minimum noise levels
Positive (and negative) controls
Visual inspection of variant bases
Positive results confirmed by a second test
Results should be described in HGVS nomenclature
What is Next Generation Sequencing?
Utilises a high throughput parallel approach to sequencing large numbers of different DNA sequences (a library) in a signal reaction.
Based on the detection of specific bases as they are added to the complementary strand, rather than chain termination technology used in Sanger.
Amount of DNA required is much lower and quality doesn’t have to be as great - advantages for patients such as those with cancer.
What is the process of NGS?
First step is to fragment genomic DNA to a uniform size, performed by e.g. covaris (acoustic based machine).
Adapt fragments with adapters needed for sequencing, this involves ligating adapters to both ends of fragments.
Adapters are made up of two parts; the outer part, allows binding to the sequencing machine, and the inner part (bound to fragment) is the barcode adapter which allows differentiation of samples in measurements where lots of samples are run at once.
The library sequence is loaded on to the machine, generating clusters. Cluster contain clones of individual sequence fragments of DNA - clonal amplification.
Sequencing reaction is performed using DNA polymerase, and the addition of fluorophores (bases are still labelled) is measured.
Each fluorescent dot corresponds to a library cluster.
Algorithms in the sequencing machine converts image files to sequencing file (same as Sanger).
What is involved in NGS sample preparation?
DNA fragmentation
End repair
Phosphorylation of 5’ ends
A-tailing of 3’ ends
Ligation of oligonucleotide adapters
PCR cycles to enrich product lighted to adapters
- barcodes introduced to aid multiplexing
Illumina’s Nextera XT does this in a single tube reaction.
What is the aim of using adaptors for library preparation?
Addition of adapters allows cluster generation on sequencing machine.
For Illumina machines the two common adapters are P5 and 7, allow fragment to bind to machine in sequencing. Then an index piece of sequence that acts as a barcode, telling you what sequence it corresponds to. Then two sequences corresponding to where primers will bind to initiate reaction.
Two sequences on either end of DNA fragment you want to sequence allows you to perform paired end sequencing, improves accuracy of sequencing read as you have double to confirmation of sequence (one from each end).
How do the processes of cluster generation, denaturation of dsDNA, bridge amplification and linearisation occur?
Cluster generation allows you to hybridise fragmented and adapted sample to the flow cell (a thin glass slide sandwich containing channels coated with oligonucleotides, corresponding to adapter regions you have put on sample).
Fragment passes through cell and hybridises to it using the complimentartity of adapter and oligonucleotides.
DNA polymerase and ddNTPs are added and the ss DNA is extended from the 3’ end.
Gives a ds DNA bound to flow cell, complementary to original strand.
The original template strand is removed by chemical denaturation and washed away, leaving a copy covalently bound to flow cell surface.
Bridge amplification occurres whereby strand flips over and finds another complementary adapter sequence bound to slide and hybridises to it.
DNA polymerase and ddNTP is added and extension occurs again, giving a double stranded bridge structure.
Denaturation occurs again, giving two covalently bound copies of single stranded DNA, these flip over again and hybridise to an adjacent oligonucleotide.
Bridge amplification occurs multiple times allowing colonial amplification and multiple copies across whole slide.
After final bridge amplification, all amplification strands are denatured, linearised and blocked so they can’t bridge again.
The reverse strands are cleaved and washed away from primer, leaving a sequencing ready slide with lots of original templates inbound to slide ready for sequencing reaction.
How is the sequencing part of NGS performed?
The 3’ ends are blocked to stop excess binding of oligonucleotides.
Primer is then hybridised to adapter sequences allowing the sequencing by synthesis to begin.
Reagents are added - ddNTP (labelled) and polymerase. Strand extension stops when ddNTP is added.
Any unincorporated bases are removed and fluorescent signals are recorded.
The bases are deblocked and the sequence is run again until we reach the end of the read length, usually 150 base pairs (shorter than sanger).
What is paired end sequencing?
Sequence strand is stripped off and the 3’ ends of template strands and primers are unblocked.
Like cluster generation, the template strand can bridge over hybridising with a primer and the previous process occurs as before.
The bridges are linearised. The the original forward strand is cleaved and the actual sequencing reaction can begin.
How does NGS data analysis workflow determine variants?
Algorythems call a base at that particular region and give you a quality score associated with that base, telling us how accurately we have called this base.
Then do secondary analysis, map individual reads from whole length sequence back to a reference sequence. Again look at quality to see how individual reads match reference genomes.
May do further analysis, eg looking at variants and the locations where ours doesn’t match reference, et.
Visualise alignment with reference and compare to other databases to see how prevalent that variant is and what clinical phenotype are associated.
What is coverage?
The average number of input reads across a genomic location.
Eg in the shown reference sequence, if you take each one base across the 20 base sequence. 6 have been sequenced once, 8 sequenced twice and 6 have been sequenced three times. If you take as an average across all 20 there is a 2 fold coverage -average across a defined genomic region of interest.