Lecture 1 - Sequencing Technologies Flashcards
What are nucleotides? What do they have on the 5’, 3’ and 1’ carbon? What are purines? What are pyrimidines?
-individual subunits of DNA
-5’ carbon has a phosphate group
-3’ carbon has an oh
-1’ carbon has a nitrogenous base
-purines - adenine and guanine
-pyrimidines - cytosine and thymine
What biochemical enzyme partakes in DNA synthesis?
DNA polymerase
What biochemical enzyme partakes in ligation in which you take two pieces of DNA and join them together?
DNA ligase
What are protein nanopores?
a single molecule technique in which a polymer is thread through a nanopore which is a nanometer sized protein channel electrophretically and a sensory measures changes in ionic current as the molecule moves through the sequence to infer the sequence
How are nucleotides polymerized?
via phosphodiester bonds linking a 3’ OH to a 5’ phosphate group
What is DNA polymerase?
a molecule that is given a template strand of DNA and a given set of nucleotides will make a reverse complement strand if a sequence of DNA so there is a template strand and growing strand of DNA and you have a primer sequence of DNA which is made from RNA and then replaced later by DNA polymerase by DNA
How does DNA polymerase work?
-forms a new phosphodiester bond between a dsDNA fragment and a free nucleotide for DNA replication
-the added nucleotide is complementary to the base adjacent to the last base pair
What is a primer?
a short piece of DNA which is 11-17 bases that is complementary to part of a longer single stranded DNA molecule and can be added to make a double stranded DNA that can be polymerized
What does DNA ligase do?
links two DNA fragments together
How do you melt DNA?
separate double stranded DNA molecules by heating it in solution because double stranded DNA is linked through H bonds and not covalent bonds which are intermolecular forces not intramolecular forces - when it is heated in solution the two strands separate
How do you anneal DNA?
to combine single stranded DNA it will naturally anneal because dsDNA is more energetically stable once you lower the temperature
What is PCR or polymerase chain reaction?
-exponentially creates copies of DNA using DNA polymerase and primers and cycles of heating and cooling
What is a nucleotide analog?
has a shared structure deoxyribose and a base with natural nucleotides
-the analog has modification that adds another functional group with different functional groups key to multiple sequencing technologies
What has changed to improve sequencing?
-human genome can be sequenced in one day for $100-200
-high throughput sequencing took over - can sequence multiple DNA fragments in parallel enabling hundreds of DNA molecules to be sequenced at the same time
What is a high throughput sensing device we all have now?
the phone
What has enabled sequencing in hundreds of millions of fragments of DNA at a time?
-due to digital imaging with the exception of certain sequencing
-use digital devices to measure sequencing reaction is shrunk dow to pico scale reactions (1) need digital sensitive reaction (2) small reactions so the camera can capture it
What are the three major steps in high-throughput shotgun sequencing?
- Break DNA from many copies of a genome into many small fragments
- select million of fragments randomly
- read the sequences of fragments in parallel
What are some other terms for highthroughput sequencing?
also called massively parallel sequencing or second gen sequencing or third gen sequencing or next gen sequenicng
What are the three properties of the analog which is the reversible dye terminator in illumina sequencing?
-there is a dye attached to the phosphate that can be detected by light
-the dye prevents incorporating an additional analog so only one analog at a time
-the dye may be cleaved and repaired to a natuve nueclotide allowing an additional polymerization reaction to happen
What happens in illumina sequencing?
-use DNA polymerase to add a reaction terminating analog then digital imaging to record what base was added
-run a reaction to remove terminator
-repeat
-add all four bases and take a pic after each one and whichever one fluoresces that is the base
What is the reversible dye terminator analog used in illumina sequencing?
-chimeric molecule where one side is a normal nucleotide and the other side there is an added piece or extra molecular structure and that does some work for sequencing the DNA - often have a dye added onto the end - 4 different dyes so each nucleotide to have a specific dye so use DNA polymerase to add the dye molecule and then zoom in closely with camera and take a picture of that dye; the dye has a terminator which prevents the polymerization - this is known as the reversible dye terminator meaning you can use chemistry to remove terminator once you have taken a picture of that dye
How do the fluorophores work?
by absorbing light at one frequency and emitting it at another frequency
What are some technological challenges to enable illumina sequencing?
- need to be able to take a picture of the molecule multiple time - if the moleucles were in solution they would be diffusing randomly
solution: chemically link DNA to a glass slide and run sequencing reaction on the slide - one molecule fluoresces to little for digital imaging
solution: amplify molecules using bridge PCR - 1000 copies of DNA sequence mounted at the same position on glass slide which magnifies the light or signals to noise ratio so you can be sure of the light you saw or did not see
What is the first step in bridge PCR?
ligate the Y adapter which is a pair of sequences that are partially complementary but partially not
-the index 1 and index 2 are used to identify reads pooled from different experiments (barcodes)
What is the first step in the amplification of bridge PCR?
DNA fragment with y adapters is added the melted on the chip and re-annealed to the turf
-float DNA over a slide that has a bunch of adaptor on a slide which is complementary to the Y adaptor on the sequence you just ligated so this means that the DNA is semi fixed meaning if you melt the DNA it will go away because the strand is not covalently bound to the slide
What is the second step in the amplification of bridge PCR?
-the reverse complement is replicated and make a new copy of DNA that is polymerized to the chip which means the new copy is bound this means if you heat up the fragments they go away
What is the third step in the amplification of bridge PCR?
-the slide is heated to melt the double stranded DNA so only the covalently bound strand remains and the original strand is gone
What is the fourth step in bridge PCR?
-the DNA is cooled so that the adapter chemically ligated to the chip and anneals to another adapter
What is the fifth step in bridge PCR?
-the single stranded DNA is replicated starting from the red primer and if you add polymerase it will replicate it
What is the final step in bridge PCR?
-heat again to unanneal the noncovalently bound bridge and now get two strands that are on the slide and the next round of this will yield four then eight and more so
What is the illumina sequencing approach for one read?
-a bunch of molecules of DNA that are all fragments of the one DNA that came from one chip and can add adapters back to the solution and polymerize with fluorophores to sequence the DNA
How many bases are run on the longest read of illumina sequencing?
300 bases or 0.3 Tb because 150bpX2
-higher models can run more fragments at the same time
-both ends of the fragments may be sequenced
What are some probability considerations you need to take into account for illumina sequencing?
-each base is the result of an extension wash, imaging, cleavage and terminal modification
-reactions that are not 100% efficient on all copies of the template will create molecules that lag behind
-eventually all molecules are out of sync
-with the probability of not extending a particular base over long fragments the plot shows the fraction of molecules not in sync after certain iteration which means less than 50% of the molecules are out of sync and they give different signals as the length of one read increases
After what base in illumina sequencing foes fidelity drop off due to the error rate?
-after the 100th base added since they get out of sync
What are the quality values for DNA sequencing?
-a quality score assigned to each base for representing the predicted probability of error of the base
-higher phred score the greater the accuracy and the phred value are rounded to integers and are capped at 60
What are the sequence file formats?
FASTA format - most widely used format - has header and sequence
FASTQ format - has the header, sequence, and phred score values
What are the errors you get with illumina seuqencing?
substitution errors not deletion or addition because it goes only 150bp so will never miss or add an additional base
What is the content of a sequencing read?
-a read starts from the 5’ end of a molecule and after amplification the dsDNA can be read from its other 5’ strand to get the 3’ end of the og molecule
What is the solution to limited read lengths in sequencing?
single molecule sequencing
What is the primary benefit of single molecule sequencing?
there is no collection of molecules to have signal out of phase
What is the old drawback of single molecule sequencing and how has this now been fixed?
-signal is from one molecule not 1000 so you have lower accuracy like 80-95% or 70-80% and there are errors of insertions and deletions not just substituions which complicates analysis compared to illumina which just has insertions
-this is now fixed by reading the same molecule more than once via circular dna
What is PacBio sequencing?
sequencing using single molecule fluorescence
How does PacBio sequencing work?
-also a light base fluorophore sequencer and they have a thin layer of metal that has little holes in it and is the size of polymerase and the wavelenght molecule so light inly penetrates through the tiny holes and the laser reflects the light shown onto an analog reflecting back into a camera
What fluorescent nucleotide analogs attached to in Pac Bio sequencing?
cleaved phosphate group
Where is the DNA polymerase tethered in PacBio sequencing?
-to the bottom of a nanoscale hole in a thin metal sheet
Where is the sheet immersed in PacBio sequencing?
a solution og analogs all of which emit light
Which analogs are illuminated by the laser in PacBio sequencing?
at the very bottom of the cell
What takes a short amount of time to accomplish the polymerization reaction in PacBio?
the polymerase
How might you miss a pulse in PacBio sequencing?
two bases that are the exact same next to each other and incorporated quickly and could get a greater pulse of light instead of two distinct pulses - can also get extra inserted pieces and deletion
Why is single strand sequencing bad?
for repeated sequences or homosequences which the human genome has a lot of the error rate is greater
What stops the polymerase from working in PacBio sequencing?
-shine pulse of light and start the reaction and start filming in real time you eventually shine enough light in the polymerase that it breaks down and stops working in theory and it can go forever but the light is damaging to the nucleotides to whycih can also negatuvely affect the polymerase
What is HiFi workflow?
developed a protocol where you take a fragment of DNA and turn it into a circle and have hairpin ends so polymerase can keep reading in a circle and can get multiple reads and then corss relate the eroors and piece together the errors based on the copies you have and can get a geomteric drop odd in probabilties in seeing the same segment of DNA which causes greater accuracy
What is the current pacbio read length?
15-18kb which is one human genome
What is single molecule sequencing with nanopores?
-have a membrane and the membrane has a bunch of nanopores and they are proteins that form a hole and they allow single stranded DNA to pass through it and the current measuring device that is centered around the pore and the concentrations of ions passing through the pore as nucleotides do through the pore
Why is ONT tech better than PacBio HiFi?
can get a much longer assembly of genomes with longer sequences that was not able to be done before an order of magnitude longer than PacBio
-the individual accuracy of oxford nanopore is better but the consensus accuracy of PacBio is better
-analysis easier with PacBio and ONT; illumina runs more sequnces of DNA still