8 - DNA Sequencing Flashcards
Sanger sequencing
based on Dideoxynucleotides
provides the exact order of the sequence for any synthetic biology plasmid construct
provides confirmation of mutants, insertions, deletions in a gene
generates fragments of different lengths, where the last base of each fragments is fluorescently labeled.
Synthesis of different length fragments:
1) sequencing primer (short nt that is complementary to the starting point of the target DNA) is added. this provides the 3’ OH group the DNA pol uses to start assembling complementary DNA based upon the target DNA sequence. the primer also defines the start site for decoding the target DNA.
2) during the sequencing reaction, DNA pol uses dNTPs (deoxy nt, same as in DNA), aka dATP, dTTp, dCTP, dGTP. these all have an OH at the 3’ C. DNA pol adds new nt to the 3’ C via this OH group coupled with the P-group on the 5’ C of the incoming nt. The H+ of the OH is released along with the two outer P-gropus.
3) ddNTPs are used to stop the synthesis. These don’t have the OH on 3’, and therefore can’t be joined to the next nt.
The ratio of dNTPs and ddNTPs is established so that by chance, DNA pol will incorporate a ddNTP often enough to create the cubsets of fragments that differ in length by one nt, but not so often that all fragments will be too short.
the fluorescent marking of the ddNTP allows the identification of the nt at this location.
DNA polymerases for sequencing DNA
elongate a primer that is annealed to a ssDNA template
must have high processivity; must move a long way along the DNA before dissociation.
must incorporate nt rapidly and accuratly
many also posess exonuclease activity, which create sequencing errors. If 5’->3’ exonuclease, a strand of DNA ahead of the synthesis point can be removed (very bad). If opposite, it can remove incorrect bases (also bad?)
No perfect match, Klenow polymerase (Rpol I from E. coli) was used first, but has low processivity. Another common one is Sequenase from bacteriophage T7 (fits all criteria). Taq is also a good alternative
automated sequencing
the machines have capillary tubes. Each individual sequencing reaction containing the different fluorescently labeled subfragments is loaded into a single capillary tube using high voltage electricity and high pressure. the tubes are filled with a polymer matrix that helps separate the fragments by size. the polymers are optimized for size differentiation as they only differ in one nt. Each band will pas a laser and detector assembly (this detects which base it is).
Next generation sequencing
NGS
key characteristic: use of massively parallell methods.
Main steps:
- genomic DNA (gDNA) isolation
- NGS library construction
* gDNA fragmentation into small ds pieces of approx eaqual size
* modifying the fragments so they are compatible with the seq platform - Partitioning the library fragments into separate locations on a solid surface and create clusters of identical copies.
- sequencing each cluster
- analysing the data
Illumina Library prep
1) need pure DNA
2) fragment the human chromosomes into small pieces of dsDNA. First, DNA is sheared using ultrasonic disruption (sound waves that disrupt the structure). The sample is at a constant temp to prevent thermal degradation. Fragments are random, and each end has a different DNA seq.
3) the uneven ends of the fragments must be made blunt. T4 DNA pol + Klenow pol + T4 polynucleotide kinase + dNTPs are added, making the ends equal. 5’ ends are then phosphorylated, and 3’ ends recieve a single A (due to Klenow). This single A overhang facilitates the binding of the adapters.
2 and 3 can also be done in another way, using Tn5 transpotase (enzyme responsible for moving transposon, a seg of DNA that can move from one location to another in a genome). This enzyme makes ds cuts in order to move the transposon to a new location on the DNA, and inserts the transposon in the cuts. By binding two adapters to this enzyme (instead of transposon), the enzyme cuts the chromosomes into pieces of approx 500 bp and adds adapters on the ends. This process is called tagmentation. Completely random cuts.
4) the adaptor has three key features:
- a seq that is complementary to the oligo-nt that are bound to the flow cell
- an index seq that is unique to the genomic fragments from one sample
- a seq that is complementary to the seq primer
Once the fragments have adapters, they are ready for an optional PCR amplification step. this can add a unique tag (index seq, aka barcode seq) if this element was not already in the adapter.
multiplexing = mixing multiple samples into one reaction.
Illumina partitioning the library
next step is to separate (partition) the library of gDNA fragments to disxrete locations on a solid surface. Illumina uses a flow cell (looks like a glass slide), a complex apparatus. there are channels (microfluidic chambers) that allow different liquid reagents to flow at a defined rate, and has a bottom surface that is coated with oligo-nt complementary to the adapter ends.
after each DNA fragment has been attached to the flow cell, the next step is to create a cluster by PCR. essential for seq. PCR primer = the oligo-nt on the flow cell surface (complementary to the adapters). occurs when the fragment is bent into a bridge so both adapters are attached. continued cycles of denaturation, annealing, and extension amplifies the DNA. It ends with denaturation, leaving one strand attached to the flow cell at one end.
Illumina sequencing by synthesis
Add sequencing primer, DNA pol, and fluorescently labeled nt with each base connected to a different fluorophore, so they can be identified. As these ingredients move through the flow cell, the seq primer anneals to its complementary location on the adapter, and then DNA polymerase starts making a copy of the gDNA fragment using the fluorescently labaled nt. Happens so quickly that it cannot be detected/analyzed in real time, so the structure of these i designed so the fluorophore acts like a blocking group. The fluorophore is positioned in the nt so DNA pol cannot add another nt until the fluorophore is removed. This allows the fluorescence of the nt to be recorder for every cluster before the next one is added.
The final seq for each cluster = a read. since there are tens of millions of clusters in a single flow cell, there are tens of millions of reads as well.
Illumina seq reads the different seg of the gDNA frag at different points. Typically multiple reads per cluster.
1) flow cell is flooded with DNA pol, fluorescently labaled dNTP, and a seq primer that anneals to the left side of the gDNA. This combination decodes the DNA from the 5’ side. This is read one.
2) the ingredients are washed away from the flow cell, and is added again, this time with a primer complemenraty to the right side adapter. This decodes the index in the right adapter. (= read 2)
3) new mix of DNA pol, fluorescently labeled oligo-nt and a third seq primer is added, producing read 3 (decodes the left side adapter index).
4) Finally, read 4 seq the gDNA from the opposite side.
Summary:
4 reads pre each cluster.
Illumina data analysis
1) combine the different reads from each cluster. 2 of the four are the index seq of the adapters. If the flow cell is simply loaded with one gDNA sample, the info from the adapter is less important, but in the case of multiplexing they are important, as they allow sorting of each cluster into different samples.
2) reads from the gDNA is aligned into continous seq info by either comparison to a previously seq genome, or by comparing one read to another looking for overlapping seq. the first (comparing to a reference genome) is most common. each read is individually aligned with the reference. differences between the read and ref genome = variant.
de novo sequencing, in contrast, is decoding the order of nt bases in a genome from an organism that as not yet been sequenced, and uses alignment of overapping sequences to order the reads.
contig = a length of decoded DNA seq that is continous with no gaps. This is what the de novo seq tries to achieve.
consensus seq = idealized base seq consisting of the bases most often found at each position.
Genome coverage is not equal throughout the genome. it will vary how many cluster reads decoded that position. This is called read depth (= the number of reads that were overlapping in their decoding of one specific location in the genome during NGS).
Ion torrent sequencing technology
1) Library prep is similar to Illumina, small fragments with adapters at the ends. As before, the adapters provide a known DNA seq at the end. the seq has binding sites for the different seq primers, PCR primers, and for the binding og the gDNA fragment to the various partitions/wells.
2) partitioning each individual fragment to a separate location. the fragments are attached to tiny microbeads that are coated with a complementory oligo-nt to the fragments adapter seq. annealing of the fragment to this bead is controlled, so only one fragment is attached.
The fragments on the beads are amplified with PCR. to avoid cross-contamination, emulsion PCR is used to “physically” separate the beads. the beads are separated by encapsulating the bead in a small drop of liquid suspended in solution of oil (beads are in a water droplet of an emulsion). The PCR reagents are mixed into the emulsion, and bc they are water soluble, wander into the water drops. the surrounding oil prevents cross-contamination.
After each cycle of PCR, the copies of the original frag anneal to complementary oligo-nt that are attached to the beads, leading to each bead being coated by thousands of fragments by the end.
3) sequencing by synthesis
after the PCR, the oil is removed and each of the beads are partiotioned into separate microwells on a dense array on the semi-conductor chip. After this separation, DNA pol and a seq primer are added. the temp is adjusted so the primer can anneal to the adapter region. No dNTPs are present yet. Each copy on each bead ends up with a DNA pol ready to get going.
a single type of dNTP is flooded over the surface of the chip. If the dNTP is not complementary to the first base on the ss fragment, nothing happens. If it is complementary, the dNTP is added in a reaction that releases an H+ (proton). This changes the pH of the well. a sensor records this change, converting it to a voltage change. Below the pH meter is a semiconductor that converts voltage into 1 or 0, which is recorded by the computer. Say the first dNTP was dGTP. All wells with C as the first base will send an electric signal, which the computer records. the others do nothing. The excess dGTP is removed, and another dNTP is added.
much faster than Illumina, but what happens if a frag has two or more identical bases in a row? The DNA pol will add both (or all), leading to double (or triple, quadruple, etc) increase in pH, which is recorded by the computer. However, in the case of four or more in a row, the pH and voltage measures do not vary as much, and the recorded pH/voltage chnges can be misinterpreted.
Targeted sequencing
targeted sequencing = isolating a series of genes or regions of interest from a whole genomic DNA sample before sequencing using next gen techiques.
Very useful to study exons, maybe not as useful with noncoding parts, in reference to diseases ++.
Two different methods to create targeted seq library:
1) use highly multiplex PCR to amplify the regions of the genome that are of interest.
- start with while genomic DNA.
- mix with a set of PCR primers that amplify target genes.
- each primer pair amplifies one region of interest from the genome. PCR performed as usual.
- the final PCR reaction is cleaned up to remove the excess PCR primers, and then collection of PCR amplicons are attached to adapters that are compatible with NGS platform. Proceed as in any NGS.
2) use biotinylated oligo-nt probes that have seq complementary to the chosen targeted genes/exons.
step 1: isolate the whole genome, fragment the whole chromosomes into small pieces, and add adapters onto the ends.
2: mix and anneal with biotinylated probes. Panel = set of probes, can have a few different or hundreds and thoussands of different probes. The seq of each probe is unique so it binds to a different location on the selected target genes.
After annealing to the probes, the probe-bound fragments must be separated from the remaining fragments. As mentioned in chap 5, biotin groups bind tightly to strepavidin. The sample is mixed with beads covered by strepavidin, and easily separated from the unbound fragments by magnets.
the fragments are remved from the magned by heating them, causing the probe and DNA fragment to denature, releasing the DNA fragment for NGS analysis.
great for use in medicine and cancer treatment.
third-generation sequencing
have the ability to decode single copies of gDNA fragment. Can decode a single strand of DNA.
NANOPORE DETECTORS FOR DNA
nanopore seq = determining the order of bases for one ssDNA as it passes through a small pore or channel in the membrane.
nanopore detector = detector that allows a single strand of DNA through a molecular pore and records its characteristics as it passes through.
As the strand passes through the pore (only one at a time), the different base structures means that it will block the current differently. The technology is rapid and can handle long DNA fragments, and many can be assembles into a small region, facilitating the sequencing of multiple DNA strands simultaneously.
the nanopore detector is a channel in a membrane that separates two aquatious compartments. When voltage is applied across the membrane, ions flow through the open channel. the DNA will be pulled towards the positive side, through the nanopore. As the DNA occupies the pore, the normal ionic current is reduced. the amount of reduction depends on the base seq (G>C>T>A); a computer can measure the current and decipher the sequence.
LONG READS FROM SMRT SEQUENCING
SMRT = single-molecule real-time.
uses zero-mode waveguides (ZMWs) or nanocontainers that are so small only a single piece of template DNA can occupy the space. As in several other sequencing methods, DNA pol extends a growing chain by adding nt tagged with 4 alternative fluorescent dyes. Incoming nt emit a flash of light as they are linked in place. the fluorescent tag is then washed away to allow for the next cycle. the sequence of the colors reveals the order of the bases.
two novel features are critical:
1) the reaction is carried out in a nanocontainer - reduces background light enough to be able to detect the flash from a single nt.
2) attaching the fluorescent tag. Instead of linking it to the part of the incoming nt that will be incorporated into the growing chain, it is attached to the pyrophosphate group that is discarded during the linking. Thus, the DNA does not accumulate tags. The fluorophore is washed away before the next cycle begins. This increases the length that can be sequenced.
DNA microarrays for sequence analysis
DNA chip = chip used to simultaneously detect and identify many short DNA fragments by DNA-DNA hybridization. aka DNA array or oligo-nt array detector
DNA chips makes the simultaneous analysis of thousands of DNA sequences possible. they rely on hybridization between ssDNA permanently attached to the chip and DNA/RNA in solution. Many different DNA molecules are attached to a single chip, forming an array of spots on the solid support. The DNA/RNA to be analyzes has to be labeled, usually with fluorescent dyes. Hybridization at each spot is scanned and the signals are analyzed to generate colorful data arrays.
two major variants of the DNA chip exists:
1) earlier chips mostly used short oligo-nt. However, it is also possible to attach full-length cDNA molecules. prefabricated cDNA or oligo-nt may be attached to the chip.
2) alternitively, oligo-nt may be synthetized directly onto the surface of the chip by a modification of the phosphoamidite method (chap 5).