Week 2.3 Genomic Technologies Flashcards

Question 1

Q

Genomic technology timeline

Answer

A

*1869** DNA extraction – No idea what DNA actually did
*1880s** Microscopy to study chromosome – still not sure what they did
*1913** Genetic map of Drosophila chromosome- first genetic map, but these were made by looking at different phenotypes in flies, one could see that they were inherited in the way it was linked. By looking at the segregation – but no idea of what it was controlling this linkage map.
*1953** – DNA discovered by Watson and Crick
*1970** Site-specific restriction enzymes – start of manipulation of DNA, chop up DNA and get different results
*1977** Sanger sequencing method – first able to sequence DNA
*1983** Polymerase chain reaction (PCR)
*1986** Automated Sanger sequencer
*1996** Pyrosequencing – commercialised in 2006 in the form of 454 sequencing
*2006** Solexa (Illumina) and 454 Sequencing

Over the last 10/15 years, there has been a huge progress on the work we can do on DNA

Question 2

Q

Who led the public project?

How was the public human genome project conducted?

Answer

A

Public project was led by Eric Lander et al
Started with genetic maps of humans by looking at the way different phenotypes linked together on the chromosome

A hierarchical shotgun method was used (BAC-by-BAC approach),
they broke up the human genome into big pieces that they put into bacteria and grew colonies of the (bacterial artificial chromosomes)

They sequenced each BAC, by shot-gun approach, they put them all into Sanger sequences and then they could put them back together.

called shotgun because you hit a random group of lots of bits of DNA Easier to deal with repeats and heterozygozity DNA 7.5X coverage of whole genome, Sanger N50 of 82 Kbp.

Question 3

Q

Public project approach;

Why did the public project take this approach?

Was only one genome sequenced?

Answer

A

Genetic maps –> BAC’s –> Shotgun sequencing
They thought that was much easier to deal with whole genome and errors in heterozygozity, DNA was collected from a number of individuals. Thus not just sequencing one genome, sequencing a number of sequences, to deal with this problem of polymorphism, they took this route. They had 7.5 fold coverage, i.e 7.5 reads covered each base in the genome, N50 of 83 Kbp, thus over half of their fragments of DNA they were over 82Kbp of DNA. Thus what they published was still very fragmented.

Question 4

Q

Who led the private project?

What method did they use?

How many indivudals did they use?

With hindsight was this a good idea?

Answer

A

Private project led by Venter et al

Venter thought the public appraoch was too labour intensive
Instead of doing BAC-BAC, they would just shotgun sequence the whole genome and then take all there 900 base pair fragments and try to fit them together.
They did 5.1 X shoutgun coverage and +2.9X from public project
Shredded into 550bp segments)
(Used genomes of 5 individuals, using two males and three females – one African-American, one Asian-Chinese, one Hispanic-Mexican, and two Caucasians)

With hindsight, this is a bad idea because you introduce a lot of variability

Ideally, you would not sequence a heterozygous, sequencing 3.2 billion base pairs rather than 6.4 billion that are found in every cell
They didn’t anticipate the drop in cost due to the magnitude of investment in genome sequencing pushing the cost down.

Mate-pair insert sizes, instead of having random reads they had pairs of length that they knew of 2, 10, and 50 kbp were used – slightly higher N50 86Kb. Some debate over how this was achieved.

Question 5

Q

Public project method

Answer

A

Genomic DNA is broken up into BAC libraries
Then those are each shotgun sequenced
Placed on a map

Question 6

Q

Private Project

Answer

A

Shotgun sequenced the whole of the genome,
then tried to assemble that back together - by using mate-pair method of known lengths

Question 7

Q

Island analogue – private project approach

Answer

A

To take an aerial photo of the island by using an aeroplane which can only take a very small area of the Island; this is what DNA sequencing is like because it can only sequence a small fragment each time.

‘Photos’ are taken at random; we just have photos with no idea where they come from.

By overlapong we can attempt to jigsaw them together problem: identical regions

The solution is using pairing technique (Venter project), instead of having one camera you would have two cameras, therefore photos can be compared relative to each other because of known distance between them.

The analogue in the human genome is that regions of the human genome that are very repetitive, thus very hard to place a read of about 900 bases that comes from a repetitive region but if we have that paired from something that is unique, we can then place where this repetitive region is. This was the technique used by the private project within there whole genome shotgun approach.

Question 8

Q

Island analogue public project approach (Lander project)

Answer

A

Public project approach (Lander project) BAC approach, by breaking up the Island into smaller areas, areas that were much bigger than the size of the actual reads that they would take but still smaller than the whole Island. They would turn them into smaller bacterial artificial chromosomes, and then by growing colonies of bacteria they are able to amplify specific segments of the human genome. Then they were able to do there shotgun sequencing just on the smaller regions of the human genome. Venter Project (private) The red line depicts the human genome, there is coverage some of it in individual sanger reads, that could be assembled together and then there are reads that are different distances apart using the mate-pair technique some of the distances are bigger than others by assembling all of them together they were able to produced human genome sequence. On average, each base pair had about 7 reads of actual Sanger sequence that were covering it. Lander Project (public project) Had an assembly that was based upon the previous assembly of the bacterial artificial chromosomes, the full assembly was compiled together from the longer sequences. Placement of the BAC’s was assisted by genetic markers that had been identified from linkage maps.

Question 9

Q

What are contigs?

What letter is used to denote unknown base?

Answer

A

Contigs >contig001 CTTCACCTTTTAAGGGTA GGACGTCAGCAATCATGA ATACTTTTTGAGGAAGTC AATATATGCGGATTTCTGTC

Contigs which are sequenced fragments of the genome in which we know what the sequence is.
Particularly in the public project, able to put the Contigs into scaffolds, longer sequence where we know the distance in the sequence but we don’t know exactly what is within that. We know how far apart they are relative to each other but we don’t know what’s between them. The ‘N’ is used to denote the unknown base

Question 10

Q

Since 2001, how has sequencing improved?

What is the latest version of the human genome?

Answer

A

Since 2001 Human genome assembly has been improved since 2001, the latest version (GRCh38) has N50 of 67,794,873 bp thus half the genome is in Contigs of lengths longer than 67million. In contrast with only 82,000 bp in 2001. This has been enabled by new sequencing DNA technologies.

Question 11

Q

DNA sequencing technologies Moore’s Law chart

What is moore’s law?

Answer

A

It would cost $100 million in 2001 to sequence one genome. 2012 its about $1,000 It followed Moore’s law up until 2007, but then next generation sequencing technologies came about that took a completely new approach to DNA sequencing

Question 12

Q

Sanger Sequencing

What is teh sanger sequencing method?

Answer

A

Sanger Sequencing Chain termination method

Replicate DNA as you would in a human nucleus that was about to divide;

As well as normal AGTC nucleotide bases (dNTPs), you add a low concentration of dideoxynucleotides (ddNTPs).

Nucleotides that are slightly modified, the ddNTPs lack a 3’-OH group necessary for the next phosphodiester bond in a DNA chain.

As soon as you incorporate, one of these ddNTP’s the replication the DNA chain stops as it is now impossible to add another base (dNTP) - chain termination

If you are replicating a fragment of DNA many times the chains produced will all stop at different points because the ddNTPs are incorporated at random.

Thus, you get a mix of different length chains is produced

Question 13

Q

4 Tube Sanger sequencing

How does this work?
How do you copy DNA?

What is used to identify te base?

What was this the first for?

Answer

A

Primer for DNA polymerase that will be copying the DNA.
Then you start copying the DNA, you do this in 4 different tubes;
Tube 1; you have ddATP, thus a modified A base, you are trying to add as well as normal bases
Tube 2; modified C
Tube 3; modified G
Tube 4; modifiedT
(In each tube you have a mix of all the bases)
In tube 1 the fragments of DNA will all stop at A’s.
Tube 2 they will stop at C’s, tube 3 all fragments will stop at G, and tube 4 at T’s
Its random which gets incorporated, because you still have all the bases, so sometimes it will stop at the first base, sometimes third, fourth and so on. This can go on for about 900 base pairs.
You take the product of the different tubes and run it in a gel with an electric current across it, as DNA has a charge the DNA will move across the gel where short fragments will move faster than long fragments.
You can look up, see which base is the final base on each fragment of DNA, You will have a band where replication has terminated in one of the tubes because the final base was the base that in that tube had dideoxynucleotides (ddNTP).

So for the first time this allowed you to read off the gel and know the sequence of the DNA. It was a very clever idea and it meant for the first time you could actually read a DNA sequence.

Question 14

Q

Capillary method Sanger sequencing

What is different about this?
How are the bases read off?

Answer

A

Capillary method Sanger sequencing A fluorescent molecule is bound to the terminating nucleotides, so A, T, G, C have different colours. This meant you did not have to have 4 tubes, you could have 4 tubes, and the different ddNTP would present different colours. You could read the colours.

Question 15

Q

Template DNA One of the major costs of the Sanger sequencing methods

What do you require to require for Sanger sequencing termination method?
What is the alternative, and what is the cost associated with this?
What did the post-Human genome porject technologies try to tackle?
What came out in 2004? and end of 2005?

Answer

A

Template DNA One of the major costs of the Sanger sequencing methods, is that you require thousands of identical copies of the template DNA, produced by cloning. Or by PCR with highly specific primer, but this adds to the cost and time of sanger sequencing. This was something that new technologies tried to overcome. After the human genome project, new technologies began to come in. There was new capillary sequencer that came out in 2002 was not step changing technology. In 2004 454 pyrosequencer came out and that led to an increase in the output of DNA sequences. End of 2005 solexa/illumina led to huge increase in the rate we sequence DNA

Question 16

Q

Newer Methods – Next generation sequencing

Answer

A

Newer Methods – Next generation sequencing Most of them act in massively parallel sequencing in that they can look at many things at the same time. They rely on random fragmentation of genome (shotgun methods) + ligation with custom linker’s Library (special adapted DNA). Each unique sequence within that library gets amplified. Then you replicate and each nucleotide incorporated during a sequencing reaction is detected and you take a video of light flashes to read sequence as it is generated.

Question 17

Q

454 Method

How does it work?

Answer

A

Moved cloning step into a tiny micro-reactor that was like an oil droplet, you could have a tube with thousands on micro-reactors all amplifying different DNA piece. You start out with a fragment of DNA from the organisms we want to sequence with an adapter either end (454 adapter), and then a solution of tiny beads. These beads have sticking out of them some DNA that is the reverse complement from the 454 library, the fragment pairs with the reverse complement on the beads and you fix the concentration of the beads so that only one fragment of DNA will attach to each bead. Each bead has a single bit of DNA. You then add emulsion oil and PCR reagents and each of the little beads becomes surrounded by oil and PCR agents, each one becomes micro-reactor where they can DNA can replicate by PCR. Without leaving the micro-reactor, other piece of DNA sticking out so that each bead is covered like a fur ball and that every bit of DNA is identical and then wash them over a plate that has tiny wells in them so that one bead with its unique DNA will fit into on well. Each well contains within it a unique DNA sequence that has been amplified thousands of times. The sequencer washed over the well dNTP’s allowing each DNA sequence to be replicated but every time a dNTP is incorporated into a well and binds to the DNA allowing one base pair extinction you get a fluorescent. In each well you have thousands of copies of DNA and if the next base is A it will go in and be added and fluorescent, you have thousands adding the exact same and so you get a spot of light, so you literally video the plate containing the different wells.

Question 18

Q

Sequencing Workflow Overview

Answer

A

One DNA fragment binds to one bead giving you one read

Question 19

Q

You have 400,000 reads per run, sequencing all of them together and the output is;

Answer

A

You have 400,000 reads per run, sequencing all of them together and the output is; Red Fluorescents that corresponds to T, green A, blue C, black G Reading that off and you know what is there There is a problem; if you come to a set of three A’s and all of them will incorporate and you get big flash of A, longer and longer, harder and harder to tell apart how many you actually have there. Struggle two tell the difference between 6 and 8 bases in a row for example, homopolymers – big flash of light but can’t tell you what is there, in the black box sequence TCAG for signal calibration. You have to do a bunch of bioinformatics to process reads