Lecture 07: DNA sequencing Flashcards

Question 1

Q

Human genome facts

Answer

A

Human genome length (nucleotides)?
- 3.6 Gb

Human genome length (metres)? ~2 m

Human genome mass?
- 0.000000000003 g

Question 2

Q

What can be sequenced?

Answer

A

Whole genome (de novo) sequencing (Re-sequencing)
Targeted (SNP, RAD, exome)
Single individual vs.
Multiple individuals (poolseq)
Multiple taxa (metabarcoding, metagenomics)
RNA → DNA (transcriptome)

Exome: composed of all the exons that remain after splicing -> all sections that potentially code for proteins

Question 3

Q

De-novo sequencing - vocab

Answer

A

Reads = original sequences, pc compares them and then organises them after similarities -> the longer the better for analysis
Contigs are just multiple reads strung together
Scaffold: how far the distances of the sequences are from each other

Question 4

Q

Notable acheivments

Answer

A

2008 first human genome sequencing through parallel DNA sequencing
2010, 185 low coverage human genomes, 697 exomes
2010 first Pleistocene human genome
2014, 48 bird genome assemblies

-> Conclusion: learn bioinformatics & programming because the data sets get bigger and bigger

Question 5

Q

Technology timeline

Answer

A

1975: Sanger sequencing
Automated Sanger sequencing
2005: “Next generation” Sequencing (NGS)

Question 6

Q

Sanger Sequencing

Answer

A

first DNA sequencing method and used for 30 years
Like most sequencing methods, it is template based
start with a single strand of DNA, which is produced by using a single primer
You set up four “sequencing reactions” wich contains:
DNA template
Primers
Nucleotides
A small proportion of one 32P-labelled dideoxy nucleotide (A,T,G or C)
The di-deoxy nucleotides stop extension of the DNA chain
different chains will be different lengths -> they can be separated by gel electrophoresis
Separate your four reactions on four lanes of a large gel
(Later 4 different fluorophores were used meaning you could run 1 sample per lane)

Question 7

Q

Pre- versus post-NGS

Answer

A

Sanger sequencing:
384 reads up to ~300,000 bp
Roche 454 sequencing (2005) :
300,000 reads up to 20,000,000 bp

Question 8

Q

Current sequencing technologies: Illumina

Answer

A

Sample prep
Bind DNA to flowcell, generate clusters
Sequencing by synthesis
Data analysis

Question 9

Q

Illumina sequencing more detailed

Answer

A

Ligation of adapters to each end of the DNA molecule
Single strands are coupled to glass slides, via adaptors
bridge amplification, “PCR colonies” or “polonies”/cluster
For subsequent sequencing, nucleotides are blocked, so no more than one can be incorporated per cycle
Four fluorescent dyes for each base allow detection via pictures

Question 10

Q

Illumina- Considerations

Answer

A

cluster density: Under-Clustered, Optimal clustered, Over- clustered
read lenght and qualtity: * High throughput
* High sequencing quality
* Limited read length (to some extent – up to 2 x 300 bp now possible)
Assembly is a problem

-> lowest cost per base, but full run cost $10.000

Advantages: high throughput and high sequencing quality, relatively cheap

Disadvantages: limited read length, quality declines with higher read lengths

Question 11

Q

Hi-C sequencing

Answer

A

Based on Illumina sequencing
Uses chromatin conformation information
Allows better scaffolding
Example: Chinese mitten crab

Question 12

Q

Newer technologies

Answer

A

Pac Bio
Oxford Nanopore
(Bionano)

Primary focus: increase read length →Improved genome assemblies

Question 13

Q

Pacific Biosystems (PacBio)

Answer

A

designing a library: circular template by ligating adapters on dsDNA
add primer and polymerase to the sample
SMRT- Cell with Zero Mode- Waveguides
each sample in one ZMW
with every labeld nucteotide incorporated by Pol. -> light is emitted

Real time sequencing

Question 14

Q

Pacific Bioscience

Answer

A

Strictly, single molecule reaction monitoring
No washing: cheap on reagents
No stop-and-go synthesis as in other systems
Recent upgrade to 8x more reads per run
Read length is up to ~40,000 bases
Initially high error rate
Highly competitive for **long reads

Question 15

Q

Pacific Bioscience HiFi

Answer

A

Strictly, single molecule reaction monitoring
No washing: cheap on reagents
No stop-and-go synthesis as in other systems
Repeated sequencing of circularized molecule
Read length is “only” ~15,000 bases on average
Very high accuracy
Excellent for de-novo assembly

Question 16

Q

Nanopore technology

Answer

Study These Flashcards

A

Proposed and started in the early 1990’s in Santa Cruz and Harvard!
* Based on threading a single strand of DNA through a microscopic hole in a membrane
* Creating an electric field across the membrane causes the DNA to pass through
* Measuring the electrical properties of the hole (capacitance), should tell you which base is passing through it
* Resolution has proved to be a bit of a problem…but it is now also excellent – up to 99.9%

Question 17

Q

Oxford Nanopore principle

Answer

Study These Flashcards

A

Array of microscaffolds
Each microscaffold supports a membrane and embedded nanopore.

Sensor chip
Each microscaffold corresponds to its own electrode that is connected to a channel in the sensor array chip.

ASIC
Each nanopore channel is controlled and measured individually by the bespoke ASIC. This allows for multiple nanopore experiments to be performed in parallel.