Lecture 2 - Sequencing Technologies Flashcards

1
Q

What are the three biochemical methods that DNA sequencing technologies use?

A

-DNA synthesis (DNA Polymerase)
-DNA ligation (DNA ligase)
-Protein nanopores (some non-protein nanopore are under development and not yet released)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are nucleotide polymerized?

A

phosphodiester bond

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does DNA polymerase need?

A

a short 11-17 base primer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does DNA ligase do?

A

links two DNA fragments together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is melting DNA?

A

separating dsDNA molecules into single strands with heat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is annealing DNA?

A

combining single stranded DNA to make dsDNA through cooling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is PCR?

A

copying DNA with heating, DNA Polymerase w/ primers, and cooling and repeating for many cycles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What changed to improve sequencing?

A

latest samger sequencing could read 384 sequences per 1 hour run; was a large reason the human genome project cost 3 billion dollars and 15 years to sequence; current technologies can sequence a human genome in 1 day for 2000 dollars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What high-throughput sensing device do all of you have now?

A

a phone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What enabled the sequencing of hundreds of millions of fragments of DNA at a time?

A

digital imaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What two things do you need for digital imaging to be used with sequencing?

A

(1) need digitally sensitive reaction
(2) small reactions on the pico scale so the camera can capture it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three major steps in high-throughput shotgun sequencing?

A
  1. break DNA from many copies of a genome into many small fragments
  2. select millions of fragments randomly
  3. read the sequence of fragments in parallel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are other terms for high-throughput shotgun sequencing?

A

massively parallel sequencing, second gen sequencing, third gen sequencing, next-gen sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many sequences of DNA in a day and a half for millions of sequences in parallel can high throughput sequencing accomplish?

A

251 sequences of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the general sequencing timeline?

A
  1. Sanger
  2. Illumina
  3. Oxford Nanopore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the nucleotide analog used in Illumina sequencing?

A

reversible dye terminator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the three properties of the reversible dye terminator in illumina sequencing?

A
  1. there is a dye attached to the phosphate that can be detected from light
  2. the dye prevents incorporating an additional analog meaning only one analog at a time
  3. the dye may be cleaved and repaired to a native nucleotide allowing an additional polymerization to happen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In illumina sequencing, what is used to add reaction terminating analog, and then digital imaging is used to record what base was added?

A

DNA Polymerase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is done to remove the terminator analog in illumina sequencing?

A

a reaction is run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an overview of the illumina sequencing?

A

for every base added you take a picture picture and you try all four bases and take a picture after the addition of each four and whichever one the camera captures fluorescing is your nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a nucleotide analog?

A

-it is a chimeric molecule where one side is a normal nucleotide and on the other side there is an added extra molecular structure that does some work for sequencing DNA - in the illumina case it is a dye added onto the end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How many different dyes are used in illumina sequencing?

A

4 different dyes so that each nucleotide has a specific dye

23
Q

What absorb and emit light at different wavelengths?

A

fluorophores; dyes work by absorbing light at one frequency and reflectin it at another frequency

24
Q

What are some technological challenges to enable illumina sequencing and how were they solved?

A
  1. need to be able to take a picture of the molecule multiple times (this would not be possible if the molecules were in solution and diffusing randomly)
    Solution: chemically link DNA to a glass slide and run a sequencing reaction on the slide
  2. One molecule fluoresces too little light for digital imaging
    Solution: amplify molecules using bridge PCR (making 1000 copies of a DNA and then doing the sequencing on the same position on a glass slide which magnifies the light or signal to noise ratio)
25
Q

What is the first step in bridge PCR?

A

ligate the Y adapter

26
Q

What is the Y-adapter in bridge PCR?

A

-a pair of sequences that are partially complementary and partially not; the index 1 and index 2 are used to identify reads pooled from different experiments (like barcodes)

27
Q

Once the Y adapter have annealed in bridge PCR what happens next?

A

-the DNA is floated over a slide that has a bunch of chips on the slide which is complementary to the Y adapter on the sequence that was ligated; this is done via melting the DNA which anneals it to the chip
-the DNA is semi-fixed meaning if you heat up the DNA it will melt away and no longer be bound cause the strand is not covalently bound the Y adaptor is covalently bound

28
Q

Once the DNA fragment is annealed onto the chip what happens next in Bridge PCR?

A

the reverse complement is replicated
-make a new copy of DNA that is polymerized to the chip which means the new copy is bound
-this means if you heat up the fragments they go away

29
Q

Once the reverse complement is made onto the chip what happens next in bridge PCR?

A

the slide is heated to melt the double-stranded DNA

30
Q

What happens in Bridge PCR once the slide is heated to melt the dsDNA?

A

-the DNA is cooled so that the adaptor chemically ligates to the chip and anneals to another adapter

31
Q

Once the “bridge” is formed in bridge PCR what happens?

A

-the single-stranded DNA is replicated starting from the red primer, and there are now two copies of the original molecule linked to the slide; can select which strand wanted based on what adapter is at the end
-the whole process is repeated 10 times so you can get 1000 copies of DNA

32
Q

Once bridge PCR is finished what happens in illumina sequencing?

A

the DNA is sequenced with the reversible dye terminators and DNA polymerase

33
Q

What are some ILLUMINA sequencing characteristics?

A

higher models can run more fragments at a time; both ends of a fragment may be sequenced; 300 bases long or 0.3 terabases of information on the longest one

34
Q

What are some important things to consider when looking at the efficiency of illumina sequencing?

A

-each base is the result of an extension, wash, imaging, cleavage, terminal modification
-reactions that are not 100% efficiency all copies of the template will create a molecule that lags behind which will eventually cause all molecules to be out of sync and disrupt the signal to noise ratio of the signals
-since illumina sequencing is 99.9% effective this means that as more sequencing continues the likelihood that all the sequences are in sync later on in the fragments is less than 50%

35
Q

How does sequencing by synthesis signal in illumina sequencing drop per base?

A

-signal drops for every base that you add which is why the length of a fragment cannot extend beyond 150bp
-fidelity drops off after the 100th base added since they get out of sync
-there is also alot of downstream sequence analysis that is done where we compare the sequence with a reference gene to see if an error did in fact tale place

36
Q

How do quality values for DNA sequencing work?

A

-each base is assigned a quality value the represents the predicted probability of error of the base
10*log(1-accuracy) - PHRED Score

37
Q

What are some exampled of PHRED scores and what do they mean?

A

PHRED 10: 90% accuracy
PHRED 20: 99% accuracy
PHRED 30: 99.9% accuracy
(PHRED values are rounded to an integer and capped at 60)

38
Q

How are PHRED scores represented?

A

-via a string of characters that is 60 characters long and the position of a particular character is the PHRED score

39
Q

What are the formats of the sequence files for illumina sequencing?

A

FASTA format - the most widely used format
-each sequence had a header line followed by additional lines with the sequence; no particular info is required in the header

40
Q

What sequence file format is used for storing quality values?

A

FASTQ format stores quality values
(need a plus to differentiate between the quality values and the sequence)

41
Q

How does sequencing by synthesis signal drop per base?

A

anytime you record a base on you file that records some base there in your DNA sequence - you will never miss a base with illumina sequencing or add a base because it goes 150bp - the only error you get with illumia are substitution error not deletion or addition

42
Q

How are sequences read in illumina?

A

a read starts from the 5’ end of a molecule; however after amplification dsDNA can be reads the other side to

43
Q

What is the solution to limited read lengths and what is its primary benefit?

A

-no collection of molecules to have signal out of phase
OLD DRAWBACK: signal is from one molecule not 1000
-lower accuracy 70-80%
-errors are insertions and deletions which complicates DNA analysis
-THESE ARE FIXED
NEW DRAWBACK: till a bit more expensive than illumina

44
Q

How does the PacBio sequencing using single molecule florescence work?

A

also a light based sequencer and it uses fluorophores with analogs
-have a 0 motive wave value - a thin layer of metals that has little holes drilled in it and is the size of the polymerase and the wavelength molecule so light only penetrates through a tiny hole and the nucleotides have analogs that reflects a different color light from the laser and the polymerase an tether it to the bottom of the wave value and the laser reflects the light shown onto an analog and reflecting it back into a camera
-fluorescent nucleotide analogs are attached to cleaved phosphate group

45
Q

What are the steps in the PacBio sequencing?

A

-DNA polymerase molecule is tethered to the bottom of a nanoscale hole in a thin metal sheet
-the sheet is immersed in a solution of analogs all of which can emit light
-however only analogs at the very bottom of the cells are illuminated by a laser
-furthermore the polymerase takes a short amount of time to accomplish the polymerization of this reaction

46
Q

In PacBio sequencing how is nucleotide incorporation recorded in real time?

A

-two bases that are the same next to each other can be incorporated quickly leading one large pulse of light instead of two separate pulses of light; can get extra inserted pieces and deletions (miss pulse or get extra pulse)
-error rate tends to be higher for homosequences

47
Q

How does the polymerase stop adding nucleotides in PacBio sequencing?

A

-eventually shine enough light on the polymerase that it breaks it down and stops working in theory and it can go forever but the light is damaging to the nucleotides and polymerase as well

48
Q

What is the HiFi workflow in PacBio sequencing?

A

-developed a protocol where you take a fragment of DNA and turn it into a circle and have hairpin ends
-this allows the polymerase to keep reading the DNA in a circle so you can get multiple reads and then can cross relate the errors and piece together the errors based on the copies you have and you can get a geometric drop off in probabilities in seeing the same segment of DNA

49
Q

What is the current PacBio specs?

A

read length - 15-18kb
output per run - 90Gb
(1 human genome)

50
Q

What occurs in sequencing single molecules with nanopores?

A

-have a membrane and the membrane has a bunch of nano pores and they are proteins that form a hole and allow single stranded DNA to pass through it
-current measuring device that is centered around the pore and since ions are constantly passing from one side of the nanopore to another can measure the electrical current and as time goes by you can get signals of nanopores based on the currents into nucleotides
-can then transfer squiggle plot information into bases
-deep learning makes these reads highly accurate

51
Q

What are the Oxford Nanopore specifications?

A

-there are many different sequences produced by the oxford nanopore
-the fastest is a runtime of 3 days and an output run of 100Gb
-individual accuracy is better in ONT but the consensus accuracy is better for PacBio with their hairpin methodology to circularize and read the DNA multiple times

52
Q

How is analysis with longer than shorter reads?

A

-it is easier with longer than shorter reads and illumina still sequences more DNA per run than the longer single molecule reads

53
Q
A