SEQUENCING TECHNOLOGIES AND APPLICATIONS Flashcards

1
Q

SANGER SEQUENCING?

A

Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses. However, the Sanger method remains in wide use for smaller-scale projects and for validation of deep sequencing results. It still has the advantage over short-read sequencing technologies (like Illumina) in that it can produce DNA sequence reads of >500 nucleotides and maintains a very low error rate with accuracies around 99.99%. Sanger sequencing is still actively being used in efforts for public health initiatives such as sequencing the spike protein from SARS-CoV-2 as well as for the surveillance of norovirus outbreaks through the Center for Disease Control and Prevention’s (CDC).

Sanger sequencing, also known as the “chain termination method,” was developed by the English biochemist Frederick Sanger and his colleagues in 1977. This method is designed for determining the sequence of nucleotide bases in a piece of DNA (commonly less than 1,000 bp in length). Sanger sequencing with 99.99% base accuracy is considered the “gold standard” for validating DNA sequences, including those already sequenced through next-generation sequencing (NGS). Sanger sequencing was used in the Human Genome Project to determine the sequences of relatively small fragments of human DNA (900 bp or less). These fragments were used to assemble larger DNA fragments and, eventually, entire chromosomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHEN WAS PCR (TECHNIQUE FOR AMPLIFYING DNA) DEVELOPED?

A

1983

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

THE HUMAN GENOME PROJECT; KEY DATES AND FIGURES

A

1984 - PLANNING STARTED
1990 - PROJECT STARTED!!!!
2001 - 1ST DRAFT PUBLISHED
2003 - DECLARED COMPLETED!!!!

  • ONE PERSON’S GENOME ONLY
  • USED SANGER SEQUENCING
  • HUMAN GENOME DETERMINED TO HAVE CCA 3 BILLION NUCLEOTIDE PAIRS
  • $2.7 BILLION INVESTED (PRICE = CCA $1 PER BASE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NUCLEOTIDES IN DNA ARE PAIR/HELD TOGETHER BY?

A

HYDROGEN BONDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHICH ENZYME IS RESPONSIBLE FOR DNA BASE PAIRING?

A

DNA POLYMERASE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DNA CAN ONLY GROW IN WHICH DIRECTION?

A

5’ TO 3’ DIRECTION (meaning that nucleotides are added by the DNA polymerase only to the 3’ end of the growing strand)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHAT DO DNA NUCEOTIDES CONTAIN?

A
  • 5 CARBON SUGAR DEOXYRIBOSE
  • A NUCLEOBASE (C,G,A,T)
  • A PHOSPHATE GROUP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHAT TYPE OF BOND MAKES UP THE BACKBONE OF DNA?

A

PHOSPHODIESTER BOND

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HOW DOES SANGER SEQUENCING WORK?

A

The Sanger sequencing method consists of 6 steps:

(1) The double-stranded DNA (dsDNA) is denatured into two single-stranded DNA (ssDNA)
(2) A primer that corresponds to one end of the sequence is attached (a DNA primer complementary to the template DNA (the DNA to be sequenced) is used to be a starting point for DNA synthesis)
(3) Four types of dNTPs (deoxynucleotide triphosphates (dNTPs: A, G, C, and T), the polymerase extends the primer by adding the complementary dNTP to the template DNA strand) and ddNTPs are added (four dideoxynucleotide triphosphates (ddNTPs: ddATP, ddGTP, ddCTP, and ddTTP) labeled with a distinct fluorescent dye are used to terminate the synthesis reaction) —-> The ratio of dNTP:ddNTP varies from around 10:1 to 300:1, depending on desired read length, buffer conditions, the polymerase used, and the electrophoresis conditions.
(4) The DNA synthesis reaction initiates and the chain extends until a termination nucleotide is randomly incorporated (Compared to dNTPs, ddNTPs have an oxygen atom removed from the ribonucleotide, hence cannot form a link with the next nucleotide)
(5) The resulting DNA fragments are denatured into ssDNA.
(6) The denatured fragments are separated by gel electrophoresis and the sequence is determined. (the sizes of molecules are different because each one had a different point in which DNA polymerase randomly incorporated ddNTP instead of dNPT)

In short: DNA is unwrapped and a primer is put on strand which we want to sequence. DNA polymerase starts DNA synthesis in a solution where the dNTPs are available (i.e. the typical bases: A,T,G,C) but also one type of ddNTPs, which, when incorporated into DNA, cause termination of synthesis. This produces DNA molecules of different length, based on when DNA polymerase randomly takes up ddNTP. Sequence of each fragment can then be determined.

Mechanism in history: Sanger sequencing was first done in 4 separate reactions using radioactivity (primers labelled with 32P); each reaction contained a different ddNTP at 1:100 ratio with dNTPs. After completion each reaction was resolved using SDS-PAGE (sodium dodecyl sulphate–polyacrylamide gel electrophoresis) to separate the chain termination products by size. The DNA was then transferred onto a membrane and the chain termination products visualised using X-ray film. The DNA sequence was determined manually by writing down the letters. Dangerous and long technique (2-3 weeks to sequence a couple of DNA molecules that were just 100-150 nucleotides long)

Mechanism now: Uses ddNTPs that are fluorescent in 4 different colours, meaning each reaction gives info on each base. Reactions are run on capillary gels (running 384 samples at the time), and read by lasers and detectors as the sample passes a detection window. Each reaction can read the sequence of a 1000 bp DNA molecule. This increases speed, accuracy and throughput.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

OTHER NAMES FOR SANGER SEQUENCING?

A

‘CHAIN-TERMINATION’ OR ‘DIEDOXY’ METHOD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WHICH dideoxynucleotide triphosphates (ddNTPs) ARE USED IN SANGER SEQUENCING AND WHAT ARE THEY?

A

DdNTP refers to Dideoxynucleotides triphosphates which are used in Sanger dideoxy method to produce different lengths of DNA strands for DNA sequencing. DdNTP includes ddATP, ddTTP, ddCTP and ddGTP. DdNTP are useful in the analysis of DNA’s structure as it stops the polymerisation of a DNA strand during a DNA replication, producing different lengths of DNA strands replicated from a template strand. These newly synthesised DNA strands are used later in gel-electrophoresis to generate a series of bands pattern which are useful to analyse the sequence of the DNA strand.

DdNTP differs from dNTP by the lack of 3’-OH group on the pentose sugar structure. A hydrogen group was found on the position 3’ instead of OH-group. This results in the termination of DNA polymerisation(or DNA elongation) process because this process needs a 3’-OH group to continue.

DdNTP are often dyed(labelled with a certain fluorescent) to ease the analysis of the DNA sequence. Dyed DdNTP will fluoresces at different wavelengths which can be detected by modern technologies. A diagram of different wavelength analysed will be generated with base labelled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MODERN SANGER SEQUENCING RUNS ON CAPPILARY GELS. HOW MANY SAMPLES CAN BE RUNNING SIMULTANEOUSLY?

A

384

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2021: PRICE FOR SEQUENCING HUMAN GENOME?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MOORE’S LAW IN CONTEXT OF GENOME SEQUENCING?

A

Although Moore’s Law mainly applies to computing hardware, predicting a doubling of computing power every two years, DNA sequencing cost has followed a similar pattern for many years, approximately halving each two years. However since January 2008 there has been a break in that trend, with sequencing costs rapidly declining after that date. This applies to both the cost per megabase of DNA sequence and the total cost per genome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WHEN WAS THE FIRST ‘HIGH THROUGHPUT SEQUENCING’ (HTS) INSTRUMENT RELEASED AND WHAT WAS IT?

A

2005, ROCHE 454

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

WHEN WAS THE FIRST PERSONAL HTS (HIGH THROUGHPUT SEQUENCING) INSTRUMENT RELEASED? WHAT WAS IT?

A

2010, MiSeq by company Illumina

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DIFFERENCE BETWEEN SHORT READ AND LONG READ SEQUENCERS? WHAT DOES ‘READ’ REFER TO?

A

READ = THE LENGTH OF THE DNA MOLECULE BEING SEQUENCED
SHORT: LESS THAM 1000 BASES
LONG: GREATER THAN 1000 BASES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

2022 SEQUENCING LANDSCAPE; MAJOR COMPANIES INVOLVED?

A
  • ILLUMINA
  • PACBIO
  • ION TORRENT
  • NANOPORE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

ILLUMINA SEQUENCING; SIMILARITIES AND DIFFERENCES FROM SANGER SEQUENCING?

A

SAME AS SANGER SEQUENCING:

  • SBS METHOD (SEQUENCING BY SYNTHESIS)
  • USING A PRIMER ANNEALING TO THE DNA TEMPLATE THAT IS EXTENDED BY THE POLYMERASE
  • FLUORESCENTLY LABELLED NUCLEOTIDES (LATER IDENTIFIED BY THEIR FLOUR LABEL)

DIFFERENCES:

  • IT CAN ONLY BE EXTENDED 1 FLUORESCENT BASE AT A TIME
  • AFTER EXTENSION, PHOTO IS TAKEN TO DETERMINE THE IDENTITY OF BASE (A,T,G OR C)
  • THE BASE IS THEN CHEMICALY MODIFIED TO: REMOVE THE FLUOROPHORE & ALLOW ANOTHER (FLUORESCENT BSE TO BE ADDED) (UNLIKE WITH SANGER SEQUENCING, THE FLUORESCENT BASES ADDED AR REVERSIBLE TERMINATORS, SO THEY CAN BE ADJUSTED TO ALLOW CONTINUATION OF SYNTHESIS AFTER AN IMAGE HAS BEEN RECORDED)

Differences Between NGS and Sanger Sequencing
In principle, the concepts behind Sanger vs. next-generation sequencing (NGS) technologies are similar. In both NGS and Sanger sequencing (also known as dideoxy or capillary electrophoresis sequencing), DNA polymerase adds fluorescent nucleotides one by one onto a growing DNA template strand. Each incorporated nucleotide is identified by its fluorescent tag.

The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This process translates into sequencing hundreds to thousands of genes at one time. NGS also offers greater discovery power to detect novel or rare variants with deep sequencing.

This technique (Illumina) offers several advantages over traditional sequencing methods such as Sanger sequencing. Sanger sequencing requires two reactions, one for the forward primer and another for the reverse primer. Unlike Illumina, Sanger sequencing uses fluorescently labelled dideoxynucleoside triphosphates (ddNTPs) to determine the sequence of the DNA fragment. ddNTPs are missing the 3’ OH group and terminates DNA synthesis permanently. In each reaction tube, dNTPs and ddNTPs are added, along with DNA polymerase and primers. The ratio of ddNTPs to dNTPs matter since the template DNA needs to be completely synthesized, and an overabundance of ddNTPs will create multiple fragments of the same size and position of the DNA template. When the DNA polymerase adds a ddNTP the fragment is terminated and a new fragment is synthesized. Each fragment synthesized is one nucleotide longer than the last. Once the DNA template has been completely synthesized, the fragments are separated by capillary electrophoresis. At the bottom of the capillary tube a laser excites the fluorescently labelled ddNTPs and a camera captures the color emitted.

Due to the automated nature of Illumina dye sequencing it is possible to sequence multiple strands at once and gain actual sequencing data quickly. With Sanger sequencing, only one strand is able to be sequenced at a time and is relatively slow.

20
Q

ILLUMINA SEQUENCING EXPLAINED?

A
  • ILLUMINA SEQUENCING WORKS ON BASIS OF ADDING FLUORESCENT NUCLEOTIDES (dNTPs) AND IMAGING AFTER EACH ONE IS ADDED, THUS ALLOWING A COMPUTER TO IDENTIFY EACH BASE, BUT THERE ARE SOME PROBLEMS WITH THIS PRINCIPLE: DNA FLOATING IN SOLUTION WON’T STAY STILL LONG ENOUGH TO BE IMAGED + SIGNAL-NOISE RATIO FOR SINGLE MOLECULES ISN’T HIGH ENOUGH TO GIVE ROBUS DATA + CHROMOSOMES ARE DNA MOLECULES TAT ARE TOO LONG TO SEQUENCE WITH SUCH SHORT READS —-> A SOLUTION FOR THIS IS TO FRAGMENT AND IMMOBILISE THE DNA AND CREATE MANY IDENTICAL COPIES THAT CAN BE IMAGED AT THE SAME TIME

ILLUMINA STEPS AND PRINCIPLES:
- PROCESS STARTS WITH PURIFIED DNA
- BEADS ARE PRELOADED WITH TRANSPOSOMES (ENZYMES THAT SILMUNTANEOUSLY FRAGMENT DNA AND ASS ON ANOTHER PIECE OF DNA) AND ADAPTER OLIGONUCLEOTIDES (SHORT PIECES OF DNA THAT ARE USED TO BE LIGATED TO THE ENDS OF DNA FRAGMENTS OF INTEREST)
- LONG FRAGMENTS OF GENOMIC DNA BIND TO THE BEADS
- TRANSPOSOMES RANDOMLY CUT THE GENOMIC DNA INTO SHORTER FRAGMENTS AND SIMULTANEOUSLY ADD/LIGATE ADAPTER OLIGONUCLEOTIDES ONTO EACH END OF EACH FRAGMENT (STRANDS THAT FAIL TO HAVE ADAPTERS LIGATED ARE WASHED AWAY)
- THE ANNEALED ADAPTER OLIGONUCLEOTIDES CONTAIN SEQUENCES THAT ARE COMPLEMENTARY TO PRIMERS THAT WILL BE USED FOR SEQUENCING REACTION (BINDING SITE FOR SEQUENCING PRIMER) AND ARE DIFFERENT ON EACH END OF EACH DNA FRAGMENT
- ADAPTERS ALSO CONTIN BARCODE SEQUENCE (INDICES, PROCESS: INDEXING) —-> Indices are usually six base pairs long and are used during DNA sequence analysis to identify samples. Indices allow for up to 96 different samples to be run together, this is also known as MULTIPLEXING. During analysis, the computer will group all reads with the same index together.
- Illumina uses a “SEQUENCE BY SYNTHESIS” approach. This process takes place inside of an acrylamide-coated glass FLOW CELL. The flow cell has oligonucleotides (short nucleotide sequences) coating the bottom of the cell, and they serve as the solid support to hold the DNA strands in place during sequencing. As the fragmented DNA is washed over the flow cell, the appropriate adapter (and with it the short DNA fragment) attaches to the complementary solid support. !!!!!!!!!!!!!!!!!!!!!!!!!!!
- BRIDGE AMPLIFICATION: CLUSTERS OF AMPLICONS IMMOBILISED ON A HIP CONTAINING A SINGLE DNA TYPE; ALLOWS FOR EASIER VISUALISATION OF INCORPORATED FLUORESCENT dNTPs DUE TO THEIR HIGH COPY NUMBER
(Bridge amplification -in detail
Once attached, cluster generation can begin. The goal is to create hundreds of identical strands of DNA. Some will be the forward strand; the rest, the reverse. This is why right and left adapters are used. Clusters are generated through bridge amplification. DNA polymerase moves along a strand of DNA, creating its complementary strand. The original strand is washed away, leaving only the reverse strand. At the top of the reverse strand there is an adapter sequence. The DNA strand bends and attaches to the oligo that is complementary to the top adapter sequence. Polymerases attach to the reverse strand, and its complementary strand (which is identical to the original) is made. The now double stranded DNA is denatured so that each strand can separately attach to an oligonucleotide sequence anchored to the flow cell. One will be the reverse strand; the other, the forward. This process is called bridge amplification, and it happens for thousands of clusters all over the flow cell at once.)
SEQUENCING BY SYNTHESIS:
- AT THE END OF THESE PROCESSES, ALL OF THE REVERSE STRANDS ARE WASHED OFF THE FLOW CELL, LEAVING ONLY FORWARD STRANDS
- A SEQUENCING PRIMER IS ANNEALED TO THE 3’ END OF THE IMMOBILISED DNA
- A POLYMERASE ADDS A FLUORESCENTLY TAGGED dNTP TO THE DNA STRAND (Only one base is able to be added per round due to the fluorophore acting as a blocking group; however, the blocking group is reversible. Using the four-color chemistry, each of the four bases has a unique emission, and after each round, the machine records which base was added. Once the colour is recorded the fluorophore is washed away and another dNTP is washed over the flow cell and the process is repeated.)
- THIS ALLOWS THE DNA SEQUENCE OF EACH CLUSTER TO BE DETERMINED FOR THE FIRST UP TO CCA 300nt (AFTER THIS, THE SEQUENCE QUALITY DEGRADES)
- PAIR END SEQUENCING: SEQUENCING FROM BOTH ENDS OF THE DNA MOLECULE IS USED WHICH CAN SIGNIFICANTLY INCREASE EFFECTIVE READ LENGTH AND SEQUENCING QUALITY
The process continues until the full DNA molecule is sequenced. With this technology, thousands of places throughout the genome are sequenced at once via MASSIVE PARALLEL SEQUENCING!!!!!.
BASE CALL ACCURACY ( THE PROBABILITY OF A CORRECT BASE CALL) >99.9%

21
Q

BASE CALL ACCURACY?

A

THE PROBABILITY OF A CORRECT BASE CALL (E.G. 99% BASE CALL ACCURACY MEANS EVERY 100bp SEQUENCING READ WILL LIKELY CONTAIN AN ERROR)

22
Q

INDEXING (IN ILLUMINA DNA SEQUENCING)?

A

AFTER TRANSPOSOMES CUT GENOMIC DNA INTO FRAGMENTS AND LIGATE AN ADAPTER ONTO EACH END OF EACH FRAGMENT, ADDITIONAL SEQUENCES ARE ADDED TO THE DNA FRAGMENTS BY PCR. INDEX SEQUENCES (BARCODES) ARE USED FOR MULTIPLEXING OF DNA SEQUENCES; ALLOWING NUMEROUS (UP TO 96?) DIFFERENT SAMPLES TO BE RAN TOGETHER

23
Q

BRIDGE AMPLIFICATION (N ILLUMINA SEQUENCING, and next generation sequencing, NGS, in general)?

A

Bridge amplification takes place in a flow cell, aiming to generating clusters of DNA strands for further sequencing and analysis. The flow cell is coated with two types of oligos, complementary to the two adapters on the fragment strand, respectively. Once the fragment strand is added to the flow cell, it hybridizes to one of the oligos on the cell surface. A polymerase then moves along the strand, creating its complementary DNA strand, i.e. the reverse strand. The double-stranded DNA is denatured and the original strand (forward strand) is washed away. The remaining reverse strand then folds over and its adapter region hybridizes to the second type of oligo on the flow cell. Polymerase attaches to the reverse strand and generates the complementary strand that is identical to the forward strand, forming a double-stranded bridge. This bridge is then denatured, resulting in two single-stranded copies of the DNA, forward and reverse strand, anchored to the flow cell. By repeating this denaturation and extension process, millions of fragments are amplified, forming localized clusters on the flow cell.

24
Q

PROS AND CONS OF SANGER SEQUENCING TECHNIQUES VS NGS (NEW GENERATION SEQUENCING) TECHNIQUES

A

SANGER - BENEFITS:

  • Fast, cost-effective sequencing for low numbers of targets (1–20 targets)
  • Familiar workflow

SANGER - CHALLENGES:
- Low sensitivity (limit of detection
~15–20%)
- Low discovery power
- Not as cost-effective for high numbers of targets (> 20 targets)
- Low scalability due to increasing sample input requirements

NGS - BENEFITS:

  • Higher sequencing depth enables higher sensitivity (down to 1%)
  • Higher discovery power
  • Higher mutation resolution
  • More data produced with the same amount of input DNA (DNA amplified, thousands of molecules sequenced simultaneously)
  • Higher sample throughput

NGS - CHALLENGES:

  • Less cost-effective for sequencing low numbers of targets (1–20 targets)
  • Time-consuming for sequencing low numbers of targets (1–20 targets)
25
Q

ION TORRENT (NEXT GENERATION SEQUENCING) PRINCIPLE?

A

Ion Torrent technology works on the principle of detection of hydrogen ion release during incorporation of new nucleotides into the growing DNA template. In nature, when a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a by-product. Ion Torrent uses a high-density array of micro-machined wells to perform nucleotide incorporation in a massively parallel manner. Each well holds a different DNA template. Beneath the wells is an ion-sensitive layer followed by a proprietary ion-sensor. The ion (H+) changes the pH of the solution, which is detected by an ion sensor. If there are two identical bases on the DNA strand, the output voltage is doubled, and the chip records two identical bases called without scanning, camera, and light(NO IMAGING REQUIRED, THEREFORE FAST CYCLE TIME AND REAL TIME DETECTION). Instead of detecting light, Ion Torrent technology creates a direct connection between the chemical and digital events. Hydrogen ions are detected on ion-semiconductor sequencing chips!!!!!!!!!!!!!!!.

‘Ion Torrent™ technology directly translates chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip. This approach marries simple chemistry to proprietary semiconductor technology; it’s Watson meets Moore.’
‘Here’s how the technology is used to call a base: If a nucleotide, for example a C, is added to a DNA template and is then incorporated into a strand of DNA, a hydrogen ion will be released. The charge from that ion will change the pH of the solution, which can be detected by our proprietary ion sensor. Our sequencer—essentially the world’s smallest solid-state pH meter—will call the base, going directly from chemical information to digital information.’

26
Q

WHICH NEXT GENERATION SEQUENCING TECHNIQUE DOESN’T USE IMAGING, BUT USES EMULSION PCR, CHIPS, SEMICONDUCTOR SEQUENCING, pH MEASURING?

A

ION TORRENT

27
Q

SHORT READ SEQUENCING - EXPLANATION, EXAMPLES, DISADVANTAGES

A

Short-read technologies carry out sequencing by synthesis or ligation. Each strategy uses DNA polymerase or ligase enzymes, respectively, to extend numerous DNA strands in parallel. Nucleotides can either be provided one at a time, or they can be modified with identifying tags.

Short-read sequencing technologies can be further categorized as either single molecule-based, involving the sequencing of a single molecule, or ensemble-based, which is the sequencing of multiple identical copies of a DNA molecule that have usually been amplified together on isolated beads.

Here are some examples of short-read sequencing technologies:

  • Illumina
  • 454 pyrosequencing
  • Ion Torrent
  • SOLiD
  • cPAL

All of these technologies have a common limitation – the inability to sequence long stretches of DNA. To sequence a large stretch of DNA using NGS, such as a human genome, the strands have to be fragmented and amplified. Computer programs are then used to assemble these random clones into a continuous sequence (i.e. sequencing results in millions of 150 - 300 nucleotide long fragments which are then assembled by bioinformatics into a single multimillion nucleotides-long molecule). Unfortunately, these amplification steps can introduce biases into the samples!!!!!!!!!!!!!!!!. Also, short-read sequencing can fail to generate a sufficient overlap between the DNA fragments!!!!!!!!!!!!!!. Overall, this means that sequencing a highly complex and repetitive genome, like that of a human, can be challenging using these technologies.!!!!!!!!!!!!!!!!!!!!

The multistep library preparation process is also a burden. For ensemble-based short-read sequencing, sample preparation usually involves:

Step #1: Extraction and purification of the DNA from the samples
Step #2: Fragmentation of the DNA
Step #3: Repair of frayed ends of the DNA
Step #4: Addition of adapters with ligases or transferases for solid-phase attachment
Step #5: The amplification of a single DNA molecule to generate millions of identical copies
Emulsion PCR and magnetic bead strategies help reduce this laborious process, but they are not fully exploited by labs currently due to the high costs.

28
Q

WHAT IS GENOME ASSEMBLY?

A

Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. De novo genome assemblies assume no prior knowledge of the source DNA sequence length, layout or composition. In a genome sequencing project, the DNA of the target organism is broken up into millions of small pieces and read on a sequencing machine. These “reads” vary from 20 to 1000 nucleotide base pairs (bp) in length depending on the sequencing method used. Typically for Illumina type short read sequencing, reads of length 36 - 150 bp are produced.
The goal of a sequence assembler is to produce long contiguous pieces of sequence (contigs) from these reads. The contigs are sometimes then ordered and oriented in relation to one another to form scaffolds.

SEARCH FOR SEQUENCE HOMOLOGIES AMONG SHORT READS —> PUT TOGETHER OVERLAPPING READS —> BUILD LONGER SEQUENCES KNOWN AS CONTIGS —> COMPARE CONTIGS TO CLOSELY RELATED GENOME, DETERMINE ORDER AND PRODUCE SCAFFOLD

29
Q

LONGER DNA SEQUENCES MADE VIA GENOME ASSEMBLY ARE CALLED?

A

CONTIGS

30
Q

WHICH PART OF CHROMOSOMES IS GENOME ASSEMBLY NOT ALWATS ABLE TO RECONSTRUCT?

A

REPEAT REGIONS

31
Q

EXAMPLES OF LONG READ SEQUENCERS (SINGLE MOLECULE SEQUENCERS)?

A
  • PACBIO

- NANOPORE

32
Q

PACBIO/SMRT SEQUENCING?

A
  • Single-molecule real-time (SMRT) sequencing, developed by Pacific BioSciences (PacBio)
  • Unlike SGS (Illumina, Ion Torrent..), PacBio sequencing is a method for real-time sequencing and does not require a pause between read steps. These features distinguish PacBio sequencing from SGS, so it is classified as the third-generation sequencing (TGS).
  • PacBio sequencing offers much longer read lengths and faster runs than SGS methods but is hindered by a lower throughput, higher error rate, and higher cost per base
    Mechanism and performance
  • PacBio sequencing captures sequence information during the replication process of the target DNA molecule
  • The template, called SMRTbell DNA, is a closed, single-stranded circular DNA that is created by ligating ‘hairpin’ adaptors to both ends of a target double-stranded DNA (dsDNA) molecule (Hairpin adaptors are ligated to the end of a double-stranded DNA molecule, forming a closed circle. )
  • A sequencing primer and a DNA polymerase bind to the adapters
  • A sample of SMRTbell is loaded to a chip called a SMRT cell ;SMRT Cell contains cca 1 million individual pores called ZERO MODE WAVEGUIDES (ZMWs)
  • In each ZMW, a single POLYMERASE-BOUND TEMPLATE becomes IMMOBILISED
  • 4 fluorescent labelled nucleotides are added
  • As a base is held by the polymerase, a light pulse is produced that identifies the base
  • A flash of light is emitted and read by detectors in the ZMW
  • The circular DNA is read many times to increase coverage and accuracy
  • most reads are longer than 10kb (compared with 150-300 nt in Illumina)
  • accuracy is cca 85% for an individual base, but as each base is read many times the aggregate accuracy is very high (99.9%)
  • longer read lengths are less accurate (each SMRT DNA loop is sequenced a lower number of times)

An important advantage of PacBio sequencing is the read length. While the original PacBio RS system with the first generation of chemistry (C1 chemistry) generated mean read lengths around 1500 bp [7], the PacBio RS II system with the current C4 chemistry boasts average read lengths over 10 kb [5], with an N50 of more than 20 kb (that is, over half of all data are in reads longer than 20 kb) and maximum read lengths over 60 kb (Figure 4) [8]. In contrast, the maximum read length of Illumina HiSeq 2500 is only paired-end 250 bp (using Rapid Run Mode) [9]. The short read lengths of SGS are commonly unable to span repetitive regions with at least one unique flanking sequence. In these cases, the origin of a read cannot be precisely determined. The consequent multiple alignments and misalignments lead to problems in downstream analysis, including the abundance estimation and the structural variation (SV) calling. Because of the much longer read lengths of PacBio sequencing, the precise location and sequence of repetitive regions can often be resolved by unique regions within a single read. Although there exist a few extremely-large repetitive regions that are longer than PacBio reads, they could be resolvable with enough heterogeneity [10].

33
Q

WHICH SEUENCING TECHNIQUE INVOLVES USING ‘HAIRPIN’ ADAPTORS? IS IT LONG OR SHORT READ?

A

PACBIO (SMRT) SEQUENCING, LONG READ

34
Q

WHICH SEQUENCING TECHNIQUE TAKES PLACE ON A ZERO-MODE WAVEGUIDE CHIP?

A

PACBIO (SMRT)

35
Q

HOW DOES PACBIO SEQUENCING INCREASE READ ACCURACY TO LEVELS SIMILAR TO THOSE OF NGS TECHNIQUES (LIKE ILLUMINA ETC)

A

EVEN THOUGH THE ACCURACY IS CCA 85% FOR EACH BASE, BY READING THE CRCULAR SMRT DNA, THE AGGREGATE ACCURACY IS UP TO 99.9%

36
Q

OXFORD TECHNOLOGY NANOPORE SEQUENCING?

A

Nanopore sequencing is a unique, scalable technology that enables direct, real-time analysis of long DNA or RNA fragments. It works by monitoring changes to an electrical current as nucleic acids are passed through a protein nanopore. The resulting signal is decoded to provide the specific DNA or RNA sequence.

  • DNA (WHICH HAS ADAPTER SEQUENCE AN MOTOR PROTEIN) IS ADDED ONTO A FLOW CELL
  • THE MOTOR ATTACHED TO DNA IS A HELICASE THAT UNWINDS THE DNA AND THREADS A SINGLE STRAND THROUGH THE NANOPORE (the DNA molecule is threaded through a bioengineered channel in a biological membrane)
  • THE SINGLE STRAND OF DNA PASSING THROUGH THE PORE CAUSES DISRUPTION OF IONIC CURRENT
  • THE SENSOR DETECTS CHANGES IN IONIC CURRENT CAUSED BY DIFFERENCES IN SHIFTING NUCLEOTIDES
  • THE SENSOR SEGMENTS THESE AS DISTINCT EVENTS THAT HAVE AN ASSOCIATED: DURATION, MEAN AMPLITUDE AND VARIANCE
  • THERE ISN’T A 1:1 RELATIONSHIP BETWEEN CURRENT CHANGES AND NUCLEOTIDES
  • THIS INFO IS THEN COMPUTATIONALLY PROCESSED AS A SEQUENCE OF 3-6 NUCLEOTIDE ‘KMERS’
  • VERY FAST!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

—> ALLOWS VERY LONG READS; UP TO 4MB
—-> FACILITATES DE NOVO ASSEMBLY OF COMPLEX GENOMES (USED FOR CHARACTERISING NEW CANCER CELL LINES WTH LARGE DUPLICATIONS OF CHROMOSOMAL REARRANGEMENTS + NEW SPECIES WITH NO EXISTING SCAFFOLDS)
—–> ALLOWS SEQUENCING OF REPETITIVE DNA SEQUENCES (E.G. DUPLICATED GENE ARRAYS OR CENTROMERES)
——> GOOD FOR FIELD SEQUENCING (SMALL DEVICE, ‘LIVE’ DATA ACQUISITION, GOOD FOR CHALLENGING ENVIRONMENTS OR RAPID RESPONSE EPIDEMIOLOGY)
—–> LOW ACCURACY (CCA95%, MEANING 5% OF BASES ARE MISCALLED); The high error rate of nanopore technology is largely due to the inability to control the speed of the DNA molecules through the pore – these are systematic errors
NANOPORE IS OFTEN USED IN COMBINATION WITH ILLUMINA ‘POLISH’

37
Q

HOW LONG ARE THE READS IN OXFORD TECHNOLOGY NANOPORE SEQUENCING?

A

VERY LONG, UP TO 4MB

38
Q

ACCURACY RATE OF OXFORD TECHNOLOGY NANOPORE SEQUENCING?

A

CCA 95% (LOW)

39
Q

ARE THE SECOND GEN SEQUENCING TECHNIQUES SHORT OR LONG READ?

A

SHORT

40
Q

WHICH TECHNIQUE OFTEN ACCOMPANIES NANOPORE SEQUENCING TO ‘POLISH’ IT?

A

ILLUMINA

41
Q

WHICH SEQUENCING TECHNIQUE IS GOOD FOR ‘ON FIELD’ SEQUENCING AND IN CHALLENGING ENVIRONMENTS?

A

NANOPORE

42
Q

SANGER, ILLUMINA, IONTORRENT, NANOPORE AND PACBIO SEQUENCING - WHICH TECHNIQUE DOES EACH ONE USE?

A
SANGER - CHAIN TERMINATION
ILLUMINA - SEQUENCING BY SYNTHESIS
IONTORRENT - SEQUENCING BY SYNTHESIS
NANOPORE - IONIC FLOW THROUGH MEMBRANIC PORES
PACBIO - SEQUENCING BY SYNHESIS
43
Q

SANGER, ILLUMINA, IONTORRENT, NANOPORE AND PACBIO SEQUENCING; WHICH ONES DON’T REQUIRE DNA AMPLIFICATION?

A

PACBIO - NEVER

NANOPORE - YES AND NO

44
Q

SANGER, ILLUMINA, IONTORRENT, NANOPORE AND PACBIO SEQUENCING - WHICH ONE CAN PRODUCE THE HIGHEST READ LENGTH AND WHICH ONE THE MOST PER RUN VS PER GB + CHEAPEST?

A

BIGGEST READ LENGTH - NANOPORE
MOST EXPENSIVE PER RUN - IONTORRENT (CHEAPEST: SANGER)
MOST EXPENSIVE PER GB - SANGER (CHEAPEST: NANOPORE)

45
Q

SANGER, ILLUMINA, IONTORRENT, NANOPORE AND PACBIO SEQUENCING; COMPARE SPEED?

A
SANGER: SLOW
ILLUMINA: SLOW
IONTORRENT: MEDIUM
NANOPORE: FAST
PACBIO: SLOW
46
Q

SANGER, ILLUMINA, IONTORRENT, NANOPORE AND PACBIO SEQUENCING; COMPARE ACCURACY?

A

SANGER: HIGH (BUT TIME CONSUMING AND SMLL AMOUNT OF BASES)
ILLUMINA: HIGH DUE TO MASSIVELY PARALLEL SEQUENCING
IONTORRENT: MEDIUM
NANOPORE: LOW WITH SYSTEMATIC ERRORS (DUE TO LACK OF CONTROL OVER SPEED OF DNA TRAVEL THROUGH THE PORE)
PACBIO: HIGH IN AGGREGATE (I.E. IF THE CIRCULAR DNA IS READ MULTIPLE TIMES)