Organizing a large-scale sequencing project Flashcards

1
Q

What are the two approaches to genome sequencing projects

A

> The hierarchical method, in which the whole genome is first fragmented and cloned into bacterial artificial chromosomes(BACs), and the order of the fragments is established before sequencing them
The whole-genome shotgun method, which works directly with larger numbers of smaller fragments, with a concomitantly (simultaneously) more challenging assembly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain BAC-to-BAC genome sequencing

A

This approach involves dividing the sample into pieces of known relative positions:
> First, cut the DNA into fragments of about 150kb. Clone them into BACs. For example, Arabidopsis thaliana has a haploid genome of about 10^8 bp in size. A 3948 clone BAC library for A.thaliana contained around 100 kb inserts per clone, giving a four-fold coverage (W=NL/G).
> Identify a series of clones in the library that contains overlapping fragments. Although referred to as ‘fingerprinting’ this process depends on shared (not unique) features of overlapping clones, including:
- overlap of restriction fragment size patterns (RE digestion)
-amplification of single-copy DNA between interspaced repeat elements and checking for similar size patterns of fragments.
- Mapping sequence-tagged sites(STSs) and looking for fragments sharing STSs
> Using the overlaps, order the clones according to their position along the original large target DNA molecule
>Subfragment each clone, sequence the fragments and assemble them:
-make the clones small enough so that the ∼ 1,500 bp sequenced subfragments can be assembled to give the complete sequence of the ∼ 150 kb BAC clones
- then the clones can be assembled using their known order in the original sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain whole-genome shotgun sequencing

A

The idea of the whole-genome shotgun approach is to sequence random pieces of DNA and then put them in the right order. If this can be done then the stage of creating a map as the basis for assembling partial sequences can be skipped. In the whole-genome shotgun sequencing of the D.melanogaster genome, the DNA was sheared into random pieces of 2kb, 10kb, and 150kb. For each piece, the sequences of approximately 500bp from each end were determined- these are called reads. A computer program then assembled the results into a set of contiguous sequences or contings. The fully assembled genome sequence, built by a coalescence of contigs is known as the ‘Golden Path’. The coverage is the average number of times a base appears in the fragments. If G=genome length, N= number of reads, and L=length of a read, the coverage= NL/G. Researchers derived formulas for the number of gaps expected as a function of coverage and genome size. Total number of gaps= N x e^-c and total size of gaps= G x e^-c. Completion of the process is called finishing, it involves the synthesis and sequencing of specific fragments to close the gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some of the concerns regarding whole-genome shotgun sequencing

A

WGS worked smoothly for prokaryotes, which contain relatively less internal repetitive sequences. Repeats create problems in assembly, and this led to sceptism about the feasibility of the shotgun approach for a complex eukaryotic genome. The Drosophila genome has fewer repeats than a mammalian genome and this contributed to its successful sequencing by shotgun methods. In the event, the publication of the Drosophila in 2000 contained 120Mb of finished sequence with about 1600 gaps. In the latest release of the Drosophila genome, the gaps have been reduced to <1% of the total sequence. Another concern is a highly skewed base composition, which complicates the application of WGS - as in the Plasmodium falciparum, which 80 mol% AT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some of the positive of WGS

A

> it may be possible to identify genes in a partially assembled genome with many gaps, provided that the genes are contained within contigs.
Celera took the success of WGS of the fruit fly as ‘proof of principle’, justifying its use in their human genome sequencing project, and completed the
‘commercial’ human genome project using the academic sequence as reference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare BAC-to_BAC and whole genome shotgun approaches

A

Two main differences:
> BAC-to-BAC methods are more robust than WGS methods. In diploid organisms, fragments arising from homologous regions of two chromosomes of a pair may have sequence differences. The correct assembly must place them at the same location, noting the discrepancies, and must not split these reads in different contigs because of imperfect matches.
> An unambiguous success of WGS methods, the Drosophila genome, was based on a highly inbred laboratory strain. Sequencing the DNA from a natural, outbred population or DNA pooled from several individuals from such a population would present more severe challenges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly