module 2: reading the genome Flashcards

1
Q

Who is the father of DNA sequencing?

A

Frederick Sanger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Frederick Sanger’s first project was in 1953 where he was elucidating the structure of ___________.

He won a nobel prize in chemistry in ______ cause he showed that:

A

Insulin

1958; proteins have defined patterns of amino acid residues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Who is Robert W. Holley?

A

Robert W. Holley won a nobel prize in 1968 for deciphering the structure of alanine transfer RNA (tRNA).

*there were first attempts at RNA sequencing in 1960s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How many years did it take researchers to determine the nucleotide sequence of alanine tRNA?

A

5.5 years!

It took them 3 years to purify 140 kgs of yeast to get 1g of alanine tRNA.

Then, it took them 2.5 years to sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

After Sanger joined the Medical Research Council in 1962 and worked with researches such as FRANCIS CRICK, there were two new techniques that transformed the field of sequencing in 1976.

What are they?

A

Chain Terminator (Sanger and Coulson) - DNA polymerase extends a radioactively labelled primer with ddNTPs and fragments are separated on agarose.

Chemical Cleavage (Maxam and Gilbert) - longer radio-labelled DNA cut into smaller pieces and separated by agarose.

*Sanger sequencing dominated the field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sanger sequencing relies on the use of ddNTPs, also known as chain-terminators.

How are ddNTPs different than dNTPs?

A

ddNTPs are missing OH on the 3’C.

This OH reacts with 5’ phosphate to form a PHOSPHODIESTER bond that links two NTs together.

Missing OH = can’t add NT! Synthesis can’t continue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How long did it take to sequence one nucleotide before the two new techniques were found?

A

1 month per nucleotide

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Briefly describe how the non-automated sanger sequencing worked.

A

Four tubes were used, each containing DNA polymerase, dNTPs, templates, and primers.

Distinct ddNTPs were present in these four tubes. These ddNTPs randomly labeled every potential position on the template.

Then, gel full of radio-activity was run and exposed to X-ray film for 24 hours and was developed.

Couple days of work would generate 100-500 base pairs of info.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is base call?

A

Identity of bases that we can derive from analyzing either the graph, gel, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Differentiate the migration direction vs the read direction of an agarose gel (Sanger sequencing).

A

Migration direction: largest fragment to shortest

Read direction: shortest fragment to largest

5’ of base call is the shortest fragment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(T/F) The first DNA genome was sequenced in 1977 and there were improvements occurring in 1986.

A

True!

*first ever to be sequenced was RNA in 1976.

*improvements done by Leroy Hood including fluorescent ddNTPs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is automated sanger sequencing different than non-automated?

A
  • Use of fluorescent ddNTPs instead of radioactive
  • Perform all four reactions in the same tube (reduces cost, time, automated)
  • DNA fragments separated by CAPILLARY electrophoresis (more precise, automated)
  • Reads up to ~1kb/day
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(T/F) The Department of Energy was seeking data to protect the genome from the mutagenesis effects of radiation in 1986. Hence, scientists at the NCHGR proposed to sequence the genome in 1988.

A

True!

National Center for Human Genome Research was lead by Dr. James Watson.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sequencing the genome was thought to be ________, _______, and ________.

A

Impractical, impossible, overambitious

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What were the three challenges of sequencing the human genome? Describe each briefly.

A

Challenge #1: Reliability
- traditional gels were providing 100 bp of sequences. we would need to run 30 million gels for 1x coverage!

Challenge #2: Availability
- most clones (template to be sequenced) were randomly derived and didn’t have material for entire genome. need to generate a library of clones that span the entirety of the genome.

Challenge #3: Assembly
- BIGGEST CHALLENGE!
- Have to fragment the entire genomic DNA into millions of pieces and must put them back in the correct order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the International Human Genome Sequencing Consortium (HGP)?

What did they propose?

A

20 research centres from UK, USA, France, Germany, China, Japan, and India came together to form this Consortium.

They proposed to sequence the EUCHROMATIN region of the genome in 15 years with 3 billion dollars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What were the 5 goals of the HGP Consortium?

A
  1. High-resolution genetic map (based on recombinant frequencies)
  2. Physical maps (based on distances) of all human chromosomes and of the DNA of selected model organisms
  3. Determination of the complete sequence of human DNA and of the DNA of selected model organisms
  4. Development of capabilities for collecting, storing, distributing, and analyzing the data produced
  5. Creation of appropriate technologies necessary to achieve these objectives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Who created The Institute for Genomic Research (TIGR)?

Why?

A

Craig Venter created TIGR.

He wanted to patent genes at NIH once he developed Expressed Sequence Tag (EST) to identify genes but wasn’t allowed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What was the faster method of sequencing that Craig Venter developed in TIGR?

A

Whole genome shotgun sequencing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which genomes did HGP consortium sequence in 1996, 1997, and 1998?

A

1996: Yeast (12 Mb)
1997: E. Coli (4.7 Mb)
1998: C. elegans (97 Mb)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Celera Genomics?

Why was it founded?

What did they propose?

A

Celera Genomics is a “for profit genomics” that was founded by Craig Venter to patent genes.

It was founded because Craig hated the way human genome project was managed. NIH rejected funding for his influenza project and his group was left our of funding to work on the genome project.

Celera Genomics proposed to sequence the human genome within 3 years in 1998!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Celera genomics sequenced which genome in 1999 and what did this do?

A

They sequenced the D. melanogaster (160Mb) in 1999.

This progress from Celera Genomics pushed the government project to re-double their efforts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What was the 20th century’s last great scientific contest?

A

The race to sequence the human genome!

Public vs Private

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the difference between sequencing DNA and sequencing genomes?

A

Sequencing DNA: obtaining a sequence of NTs of a gene or a segment but do not know where it belongs in the genome

Sequencing genomes: determining the identity of all 3 billion bps in order of p arm to q arm of all chromosomes. (where does the DNA go?).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

(T/F) Hierarchical sequencing used by public and whole-genome shotgun sequencing used by private differ the most in the assembly process.

A

True!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What were the three main steps of hierarchical sequencing?

A
  1. Selecting (the BAC clones aka pre-sequencing)
  2. Sequencing (the chosen clones)
  3. Assembling (individually sequenced clones into an overall sequence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

To select clones for sequencing, a library had to be created. How were these created?

How much coverage did they have?

A

DNA was received from anonymous males (X and Y chromosome).

There was partial digestion of the DNA with RESTRICTION ENZYMES to create large fragments.

These were cloned into BACs and PACs.

This generated 8 libraries with 1.5 millions of clones.

~65 fold coverage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

1) What is coverage?

2) How is it calculated?

3) Why did we need a high coverage for the genome project?

A

1) It is the relationship between SEQUENCES (BAC insert, sequence read) and a REFERENCE (a specific position, a locus, chromosome 1 or the entire human genome).

It describes HOW OFTEN, on average, a reference sequence is covered by bases from the reads.

2) Coverage = (avg insert size x # of BACs)/haploid genome size

3) Abundance of starting material so every sequence we want to maintain would be present in our library.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Describe the clone fingerprinting technique.

A

Clone fingerprinting technique is used to map overlapping clones.

1) Digest clones in BACs with RESTRICTION ENZYMES

2) Separate fragments by agarose gel electrophoresis and look for bands in common

If there are clones that share a common sequence, they are overlapping.

*this was done for 300k BAC clones in the hierarchical sequencing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Fingerprint clone contig is a set of __________ clones that correspond to a ________ chromosomal sequence.

A

Overlapping; linear (contiguous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Once overlapping chromosomes were identified via fingerprinting, how were their precise locations in the genome determined?

A

Physical mapping using SEQUENCE-TAGGED-SITE (STS)!

These are known sequences (~100-500bp) that is unique in the genome.

PCR amplifies specific STSs for each clone in the genome. If a clone is positive for STS X, we know that the clone comes from wherever STS X comes from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does it mean when multiple clones are positive for the same STS?

A

These clones are overlapping because they both express the STS!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Minimal tiling path/golden tiling path is used for …..?

What are they?

A

Used for choosing clones for sequencing.

These are a set of overlapping DNA clones that cover an entire genomic region with the minimum number of clones required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How are the chosen clones sequenced in hierarchical sequencing?

A
  1. Mechanical fragmentation of the BAC insert DNA into ~1kb pieces
  2. Cloning of the DNA fragments into plasmids (vectors)
  3. Sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Why are restriction enzymes not used for fragmenting the BAC inserts when sequencing?

A

Restriction enzymes target a specific sequence, causing all fragments to have the same ends.

We want to generate as many random ends as possible so we are able to link them together during assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are sequenced clone contigs? How are they different from fingerprint clone contigs?

A

Sequenced clone contigs are made with SEQUENCED clones with overlapping end sequences.

Fingerprint clone contigs rely on the sizes of DNA fragments in clones to establish their order and do not involve direct DNA sequencing. Sequenced clone contigs has actual sequence info.

*after selecting and sequencing, sequenced clone contigs are made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

(T/F) There was ~2x coverage in the sequenced clone contigs for the public effort.

Also, why is this important?

A

False!

There was ~4x coverage!

It is impossible to know if it’s a variant or an error with a small coverage. Every nucleotide is represented four times in the data (four copies of human gene)! This helps with ACCURACY.

38
Q

Which one of the statements is false regarding the public effort (hierarchical sequencing):

1) Most of the sequencing data was obtained in the last ~8 months of the project.

2) By 2000, the sequencing was at a rate of 1000nt/s, 24/7

3) According to the Bermuda Principle, sequencing data was made available to the public online every week.

A

3!

According to the Bermuda Principle, sequencing data was made available to the public online every 24 HOURS.

This was done to prevent the patenting of genes to maximize the goodness of the project! However, Selera genomics could benefit from this.

39
Q

Briefly answer the following questions regarding shotgun sequencing strategy used by Celera Genomics:

1) What did Craig Venter aim with this project?

2) How did it differ from hierarchical sequencing?

3) What DNA was used?

A

1) Aimed to completed his project before the public effort and for less money (300 million total)

2) Hierarchical sequencing fragmented the genome, cloned them into vectors, spent time ordering the BACs, sequenced them in an orderly fashion.

This sequencing strategy fragmented the entire genome, which were cloned into plasmids and sequenced and then assembled using paired-end reads.

3) DNA from 3 females and 2 males (different ethnicities), including himself.

40
Q

(T/F) To fragment the ENTIRE GENOME, restriction enzymes were used by Celera Genomics.

A

False!

Sonication was used!

These fragments were cloned into vectors and then sequenced.

41
Q

When was assembly problematic in the shotgun method?

A

Assembly was problematic when dealing with repetitive genomes in the shotgun method.

Genomic DNA is broken into random fragments, and the task is to reassemble their sequences. However, there are repetitive sequences that are not positioned next to each other but share the same sequence in the genome. These can lead to incorrect overlap, making it challenging to confirm the correct order.

*hierarchal very orderly; best method to use when sequencing a genome for the first time

42
Q

Because of repetitive sequences, shotgun sequencing creates ____ very large contigs and _____ smaller ones that need to be joined together.

A

few; many

short reads of DNA result in the assembly of small contigs that can’t be connected due to the presence of repetitive sequences. instead of obtaining a single, continuous contig representing a certain chromosome, we end up with multiple small contigs because information is missing to put them together.

43
Q

Why are repetitive sequences not an issue in hierarchical sequencing?

A

Hierarchical sequencing uses STS.

Sequence Tagged Site (STS) knows the location of a contig. It is able to know that a repetitive sequence comes from distinct chromosomes or areas even though it shares overlapping regions with a repetitive sequence from a different area.

It doesn’t have missing information that prevent the formation of a long contig.

44
Q

Briefly answer the following questions regarding N50:

1) What does N50 size measure?

2) How is N50 size calculated?

3) Is N50 of 50 better than N50 of 300?

A

1) N50 size measures completeness of a genome ASSEMBLY (good size = few large contigs).

2) Calculate the total length of all contigs (total assembly length) and order the contigs from longest to shortest. Then, add up the length of the largest contig sequences until you reach 50% of the total assembly length. THE SIZE OF THE CONTIG THAT MAKES IT REACH THE 50% LENGTH IS N50.

3) No, N50 of 50 is not better than N50 of 300.

Half of the genome sequence is in contigs larger than or equal to the N50 contig size. A higher N50 value indicates longer sequences.

45
Q

(T/F) An assembly with many short contigs will have a smaller N50 value than an assembly with many larger contigs.

A

True!

46
Q

What are paired-end reads?

A

Pair of sequences that come from the same DNA template, separated by a FIXED distant.

Sequencing both ends (part of the sense strand and anti-sense strand) of the same template.

47
Q

Differentiate SIPERS vs LIPERS.

A

SIPERS: short-insert (200-800bp) paired-end reads.

LIPERS: long-insert (>1kb) paired-end reads (aka MATE PAIRS).

48
Q

The objective of shotgun sequencing is to identify overlapping regions in sequencing reads and assemble them into contigs. However, repetitive sequences in the genome lead to the production of many short contigs.

How was this problem overcome?

A

Through the use of paired-end reads of larger inserts (mate-pairs)!

Once we have sequenced all of our shot-gun fragments:

1) we create a larger library starting from the beginning: we fragment our genomic DNA, clone them into sequencing vectors, and sequence everything from BOTH ENDS, providing us with paired-end reads.

these paired-end come from the same DNA fragment & are separated by a known distance.

2) we look for these paired-end reads within the sequenced clone contigs we generated at first.

since we know the distance between the paired reads, we can “anchor” the contigs in the correct order.

49
Q

Paired-end reads are used to anchor the contigs together into a ________!

A

Scaffold

50
Q

Sequencing coverage is also referred to as ________.

It is the number of times a given ________ in the genome has been _____ on average.

We aim for ____ coverage these days.

A

Depth

Nucleotide; read

50

51
Q

What is the Lander-Waterman equation to calculate coverage?

A

Coverage = (read length x number of reads)/haploid genome length

52
Q

What is the difference between Depth vs Breadth of sequencing?

A

Depth refers to the number of times a specific base in the genome is sequenced or covered (region is sequenced 10x).

Sequencing breadth refers to how much of the entire genome is explored (95% of genome is covered 10x).

53
Q

(T/F) Craig Venter used data from the public project to assemble his sequence. Celera has 4x coverage while govt has 8x coverage.

A

False!

Craig Venter did use data from the public project to assemble his sequence!

However Celera had 5x coverage and 3x from public data, giving it a coverage of 8 fold while govt only had 4x coverage.

54
Q

The human genome sequencing project was completed in _______, with a final cost of __________.

It has ___% sequence overlap (150,000 gaps).

A

2003 (3 years ahead of schedule); $2.7 billion (less than expected) - due to competition with Celera

90

55
Q

___% of the coding region was done in 2003.

The first truly complete sequence of the human genome was finished in ______, with ___ gaps, done by Telomere-to-Telomere (T2T) consortium.

There was a complete sequence of the Y chromosome in _______, _____.

A

99 (with accuracy of 99.9)

2022; no gaps (~200 million bps of novel sequence)

August 23, 2023 (also done T2T)

56
Q

Why was there a need for next-generation sequencing methods (NGS)?

A

Sequencing the human genome for the first time took 13 years and cost $3 billion!

Sanger sequencing, though it started becoming automated at the end of the project, had high accuracy with long read lengths (500-1000bp), it still generated a relatively small amount of data.

A faster and less expensive type of method was needed!

57
Q

(T/F) The cost per Human Genome decreased every year, and this can be explained by the Moore’s Law.

A

False!

Moore’s law states that computing power of a technology doubles every 24 months and it is doing very well if it adheres to this law.

However, the cost per human genome outpaced Moore’s law. It couldn’t be explained solely by improvements in computing technology! Explained by NEXT GENERATION SEQUENCING!

58
Q

What are the three main components of NGS that help it take a $3 billion genome to a $1000 (or less) genome?

A

1) Integration (cut down the number of steps and instruments)

2) Parallelization (sequence many reactions in parallel)

3) Miniaturization (use a single instrument, go from mL -> uL)

59
Q

Next-generation sequencing methods all share a common workflow. Briefly describe the workflow.

A

Library preparation (preparing product to sequence) –> Sequencing –> Imaging.

60
Q

Describe the six steps of library preparation of NGS.

A

1) Genomic DNA extraction (~100-500ng)

2) DNA fragmentation: physical or chemical preferred over enzymatic to generate RANDOM ENDS.

3) End-repair and A-tailing: forms hybridization partner for adaptors with T-overhangs.

4) Adapter ligation: flow cell binding sequence, sequencing primer binding site, sample index/barcode.

5) Size selection: usually short fragments (200-400bp), optimal length is determined by limitations of the instrument and/or specific sequencing application.

6) PCR amplification (not always): increases amount of library available for sequencing which helps to see more, enrich fragments that have an adaptor ligated to each end.

61
Q

(T/F) A flow cell is a glass slide with several microfluidic channels which allows reagents to flow.

A

True!

It allows for massively parallel sequencing.

62
Q

For NGS, what are the two methods of preparing DNA libraries?

A

1) FLOW CELL - base-pairing to oligos on a flow cell that are complementary to BOTH adapter sequences of the gDNA

2) BEAD BASED - base-pairing to tiny metallic beads coated with oligos that are complementary to only ONE of the two adapter sequences of the gDNA

63
Q

Briefly describe how base-pairing template DNA to oligos on a flow cell creates clonal clusters.

A

There is a “lawn” of oligonucleotides that are IMMOBILIZED on the flow cell surface.

Oligo sequences are COMPLEMENTARY to the adapter sequences on the templates.

Oligonucleotides act as primers to copy the template. DNA polymerase copies this template from 5’ -> 3’ direction.

This is followed by DENATURATION (with temp), removing the original template bound to oligo sequences, while the copies are immobilized on the flow cell surface.

Then, there is the isothermal bridge amplification and this process is repeated to create dense CLONAL CLUSTERS.

64
Q

Describe how isothermal bridge amplification occurs.

A

Copies (made by DNA polymerase) that are immobilized to the flow cell loop over and hybridize to adjacent lawn oligos.

Then, DNA polymerase extends the 3’ end of those lawn oligos, following the template of the looped over copy, until it reaches the first oligo where the copy came from, creating a dsDNA BRIDGE.

Denaturation leads to immobilized two ssDNAs.

This process is repeated over and over again to create DENSE CLONAL CLUSTERS of the single original template.

65
Q

(T/F) Oligonucleotides and copied templates are attached by their 3’ ends on the flow cell surface and the 5’ ends are always pointing up.

A

False!

Oligonucleotides and copied templates are attached by their 5’ ends on the flow cell surface and the 3’ ends are always pointing up.

This way, DNA polymerase is able to extend the 3’ OH.

66
Q

What happens after clonal clusters are formed in oligos to prepare for sequencing?

A

Originally, you had a heterozygous population with forward and reverse strands. However, reverse strands are cleaved and washed away to make a homogenous population (so each cluster can can have the same sequence).

Then, the free 3’ ends of FORWARD STRANDS and OLIGOS are blocked to prevent looping over.

67
Q

Which statement is false?

1) Each cluster on a flow cell is 1 DNA fragment to be sequenced.

2) There are more than two adaptors for library preparation.

3) The immobilized strands are always bound at the 5’ end on the flow cell

A

2!

There are only TWO adaptors (P5 and P7) for library preparation!

68
Q

Briefly describe how base-pairing template DNA to metallic beads works.

A

Adaptors of the gDNA fragments contain sequences that are complementary the sequences on the surface of the bead.

There is only 1 type of DNA fragment PER BEAD.

Library DNA, the beads, emulsion oil are mixed for an emulsion PCR which creates bubbles. Each bubble traps a bead along with either zero or one DNA strand from the library.

In the bubbles with the bead and the DNA strand, the PCR extends the primer, creating strands of DNA that are attached to the surface of the bead.

There is 1 bead with identical copies of DNA; CLONAL CLUSTERS (many beads with different fragments of gDNA).

The emulsion is broken and the beads are deposited into the wells of a flow cell for DNA sequencing!

69
Q

(T/F) Emulsion PCR is a great example of parallelization and miniaturization.

A

True!

70
Q

There can be sequencing by _________ and sequencing by ___________.

A

Synthesis (DNA polymerase); Ligation (DNA ligase).

*ligation not used often

71
Q

While the Illumina sequencing method makes use of _______, the Ion Torrent method makes use of _________.

A

Flow cells; Beads

72
Q

How are the nucleotides modified in Illumina sequencing?

A

1) 3’-blocked terminator (reversible)

2) cleavable reportable dye (attached to base)

*CYLCIC REVERSIBLE TERMINATION

73
Q

How are the nucleotides modified in Ion Torrent sequencing?

A

They are NOT!

74
Q

Match the steps of Illumina sequencing to their proper order:

1) Step 1
2) Step 2
3) Step 3
4) Step 4
5) Step 5

A) Image the fluorescent signal - each cluster emits an intensity at a unique wavelength, based on the base that was incorporated.

B) Repeat the cycle: nucleotide addition, elongation, imaging, cleavage

C) Cleavage of the dye and regeneration of a free 3’OH with a reducing agent

D) Add 4 reversible terminators, primers, and DNA polymerase - the nucleotides hybridize to complementary base and each cluster can incorporate a different nucleotide.

E) Wash away unincorporated nucleotides.

A

1) Add 4 reversible terminators, primers, and DNA polymerase - the nucleotides hybridize to complementary base and each cluster can incorporate a different nucleotide.

2) Wash away unincorporated nucleotides.

3) Image the fluorescent signal - each cluster emits an intensity at a unique wavelength, based on the base that was incorporated.

4) Cleavage of the dye and regeneration of a free 3’OH with a reducing agent

5) Repeat the cycle: nucleotide addition, elongation, imaging, cleavage

75
Q

(T/F) When using 4 channels to image the fluorescence (Illumina sequencing), each cluster will appear in all four images per cycle.

A

False!

Each cluster will appear in ONE image per cycle.

We are emitting fluorescence at four different wavelengths (four different dNTPs). Any given cluster on the surface of the flow cell will emit only one of the four wavelengths (= one dNTP added).

76
Q

What is a composite image?

A

In illumina sequencing, 4 pictures are taken for four distinct wavelengths. You overlay these 4 pictures to make a COMPOSITE IMAGE.

This is done for every cycle of nucleotide addition.

77
Q

How does paired-end sequencing work in Illumina?

A

Paired-end sequencing means both ends of a template are sequenced.

During library preparation for Illumina (flow cell oligos), same steps occur but instead of getting rid of the reverse strand, you get rid of the forward strand:

  • 3’ ends are unblocked of immobilized forward strands
  • single-stranded template loops over to form a bridge by hybridizing with a lawn primer
  • original forward template is cleaved
78
Q

Ion Torrent is the first platform without ______ sensing. It senses changes in _____.

The __________ nucleotides are added one at a time.

A

Optical; pH
Unmodified (no terminators, no fluorophores).

79
Q

How does Ion Torrent measure pH changes?

A

Beads are laid on a flow cell where we can measure the change in voltage.

As a base in incorporated (dNTP is complementary to the NT of the sequence), phosphodiester bond formation releases a proton.

0.02 unit change in pH is detected by the device.

80
Q

What is the major limitation of Ion Torrent sequencing?

A

Only one dNTP species is present during each cycle but several identical dNTPs can be incorporated during a cycle in HOMOPOLYMERIC REGIONS, increasing the emitted protons.

Change in pH is proportional to number of bases added.

We have to infer how many bases were added in these regions.

81
Q

Compare and contrast the Ion Torrent method and the Illumina sequencing method.

A

Similarities:
- similar library preparation
- both rely on clonal clusters (amplification of signals)
- sequencing by SYNTHESIS (both rely on DNA polymerase)

Differences:
- Ion Torrent is ASYNCHRONOUS (different templates are not growing at the same time), while Illumina is SYNCHRONOUS.

  • Ion Torrent uses beads, while Illumina uses flow cells
  • Ion Torrent is non-optical (pH), while illumina is optical
  • Ion Torrent is cheaper and has slightly longer reads but is LESS ACCURATE
82
Q

Why is Ion Torrent an asynchronous method?

A

Ion Torrent is a SINGLE NUCLEOTIDE ADDITION TECHNIQUE - only one type of dNTP is being added each cycle. Not all templates (each clonal cluster) will have a base that is complementary to that dNTP. Hence, they grow at different rates.

In Illumina, all four dNTPs are added at the same time.

83
Q

(T/F) Both Ion Torrent and Illumina sequencing methods are suitable for paired-end reads.

A

False!

Bead-based methods (Ion Torrent) are NOT SUITABLE for paired-end reads.

84
Q

Why was there a need for a 3rd generation sequencing (“real-time sequencing”)?

A

2nd generation sequencing (illumina, ion-torrent) produced SHORT END READS.

Short end reads are not able to resolve issues with long repetitive elements, copy number alterations, etc.

Thus, we needed long-read sequencing to generate reads of >several kbs.

85
Q

1) What does SMRT stand for?

2) What are SMRTbell templates, and how are they used in single-molecule real-time sequencing?

3) What is a Circular Consensus Sequence (CCS) in SMRT sequencing?

A

SMRT: Single-Molecule Real-Time Sequencing (3rd generation)

2) SMRTbell templates are single-stranded hairpin adaptors added to the ends of double-stranded genomic DNA (ds gDNA). They serve as a platform for DNA polymerase and primers to bind, allowing the creation of circular copies of the DNA.

3) After accumulating many copies of the circular DNA, the raw long read is processed into a Circular Consensus Sequence (CCS). This is what will be used to sequence the DNA.

86
Q

Briefly describe how the Circular Consensus Sequence is sequenced in SMRT using zero-mode waveguides.

A

Zero-mode waveguides are thousands of individual picoliter wells with transparent bottoms.

The DNA Polymerase is IMMOBILIZED in the bottom of the well.

Each well gets a distinct CCS and primers.

All four dNTPs diffuse in and out of the active site of DNA polymerase. When the dNTP is complementary to the template, DNA polymerase catalyzes the formation of a phosphodiester bond, resulting in an increase in fluorescence pulse.

Then there is the termination of the signal when the phosphodiester bond is formed, releasing pyrophosphate with the terminal phosphate bound to fluorescence.

87
Q

Mark these as True or False:

1) Second-generation sequencing sequences 200-400 base pairs of DNA and requires signal amplification.

2) Third-generation sequencing typically sequences DNA fragments in the range of 500 to 10,000 base pairs and involves the amplification of signal through the creation of multiple circular DNA copies.

3) SMRT is a great example of miniaturization.

4) SMRT measures the fluorescence from the top of the well.

A

1) True

2) False! No amplification of signal in SMRT. The multiple circular DNA copies, used to measure accuracy, are processed into ONE circular CONSENSUS sequence (CCS) that is used for sequencing.

3) True!

4) False. SMRT measures the fluorescence from the BOTTOM of the well.

88
Q

What is the limitation of real-time sequencing (3rd generation)?

A

Speed of processing relies on how quickly DNA polymerase can copy template, which is based on how quickly the dNTPs can diffuse in and out.

10 NTs/ second

89
Q

What is the main advantage of 4th generation sequencing?

A

It eliminates the need to “COPY” the DNA strand and instead reads the nucleotide sequence of the molecule directly.

*you are only as quick as your DNA polymerase when you have to copy the DNA strand

90
Q

How does Nanopore sequencing (4th generation) work?

A

In this method, there are very thin membranes with protein pores that allows the DNA to pass through. At the surface of each pore, there is a helicase (motor protein) which unwinds the dsDNA and feeds one end through the pore until the entire genomic DNA (forward and reverse strands) are passed through.

There is a current applied across the membrane. DNA blocks the voltage as it goes through the pore and changes the current. Each nucleotide occludes the channel differently, thus modulating the current across the membrane.

91
Q

Briefly answer the following questions regarding nanopore sequencing:

1) What kinds of reads does it generate?

2) Does the gDNA have anything attached to it before being sequenced?

3) What is a SQUIGGLE?

A

1) Ultra-long reads (10k-200kbp)

2) Yes! It has a hairpin loop on ONE END.

3) Squiggle is the output of nanopore sequencing. The signal is processed and decoded by base calling algorithms into a NT sequence. It also provides information on modified bases.