Chapter 3 Lec4 +5 Flashcards
Mapping, sequencing, annotation, and databases
What are the two general approaches to genome sequencing and what happens in each approach. (7)
- Hierarchical method (aka ‘BAC-to-BAC’ method)
- the whole genome 1st fragmented
- and cloned into bacterial artificial chromosomes (BACs)
- and establish the order of fragments before sequencing them - The whole-genome shotgun (WGS) method
- works directly with large numbers of smaller fragments
- entails a concomitantly more challenging assembly problem.
Define bacterial artificial chromosome(BAC).
Is a plasmid containing foreign DNA
Discuss the approach to organizing the sequencing of large DNA molecule into pieces of known relative position. Provide with all necessary examples or features.
- 1st, cut the DNA into fragments of about 150kb, clone them into BACs.
E.g. Arabidopsis thaliana has a haploid genome size of about 10^8 bp. A 3948 clone BAC library for A.thaliana contains ~100kb inserts per clone, giving approximately four-fold coverage.
- Identify a series of clones with overlapping fragments. Termed or referred as DNA fingerprinting, and depends on shared(rather than unique) features of overlapping clones, including:
1. Overlap of restriction fragment size patterns- RE digestion
2. Amplification of single-copy DNA between interspaced repeat elements and checking for similar size patterns of fragments.
3. Mapping sequence-tagged sites(STSs) and looking for fragments sharing STSs. - using the overlaps, order the clones according to their position along the original large target DNA molecule.
- Subfragment each BAC clone, sequence the fragments and assembled them:
- make the clones small enough so that the ~1500bp sequenced sub subfragments can be assembled to give the complete sequence of the ~150kb BAC clones,
- then the clones can be assembled using their known order in the original sequence.
What is the principle of the whole-genome shotgun approach?
Is to sequence random pieces of the(genomic)DNA and put them together in the right order.
-if this can be done then one can skip the laborious stage of creating a map as the basis for assembling partial sequences.
Explain what happens in the whole-genome shotgun sequencing of the Drosophila melanogaster(fruit fly) genome.
- DNA was sheared into random pieces of approximately 2kb,10kb, and 150kb.
- For each piece, sequences of ~500bp were determined from each end and were called “reads”
- A computer( in silico) assembled the sequences into maximal set of contiguous sequence, or contig.
• ‘Golden Path’
Define a ‘Golden path’.
Is the fully assembled genome sequence, but built by coalescence of contigs.
Define coverage and give the appropriate formula for it.
Is the average number of times each base appears in the (sequenced)fragments.
Coverage = NL/G
•N, number of reads
•L, length of a read
•G, genome length
Explain what can be done to close the gaps, once you’ve assembled the contigs and identified the gaps?
Finishing, which involves synthesis and sequencing of specific fragments to close the gaps.
Provide the two concerns of whole-genome shotgun(WGS) sequencing.
(1) WGS worked smoothly for prokaryotes, contain relatively less internal repetitive sequence.
- repeats create problems in assembly;and led to scepticism about the feasibility of the shotgun approach for a complex eukaryotic genome.
- D.melanogaster - fewer repeats than mammalian genomes and this contributed to it’s successful sequencing by shotgun methods
> publication of the Drosophila genome in the year 2000 contained 120 Mb of finished sequence, with about 1600 gaps.
> later, the no. Of gaps had been reduced to less than 1% of total sequence
(2)Genomes with highly skewed base composition also complicates application of WGS
E.g. Plasmodium falciparum- contains ~80mol% AT
Provide the two positives about WGS approach.
(1) it may be possible to identify genes in a partly assembled genome with many gaps,provided that the genes are contained within contigs.
(2) fruit fly WGS sequencing by Celera- ‘ proof of principle’ ; completed the ‘commercial’ human genome project using academic sequence as reference.
Name the two main differences between BAC-to-BAC and WGS approaches and explain them.
(1) BAC-to-BAC methods are more robust than WGS methods.
• in diploid, fragments arising from homologous regions of two chromosomes of a pair may have sequence differences.
• correct assembly must place them at the same location, while noting the discrepancies, thus, assembly must not split these reads into different contigs because of the imperfect matches(BAC ordering)
(2) Highly inbred laboratory strain vs outbred population or pooled DNA
• would present a more severe assembly challenge (in light of point above)
State the common and different steps in ‘BAC-to-BAC’ and WGS methods.
- Make random cuts to produce fragments of(150 kb in ‘BAC-to-BAC ‘) and (2000 kb and 10000 kb in WGS)
- Clone fragments to create BAC library and plasmid library in WGS.
- Fingerprint, overlap and order BAC clones, skip this step for WGS.
- Subclone into plasmids and partially sequence 1.5kb Subfragments of individual subclones
WGS- partially sequence 1.5kb subfragments of individual plasmid clones.
- Both assemble overlaps by computer(silico)
Define single-end read
A technique in which sequence is reported from only end of a fragment.
Define paired-end read.
A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).
Define paired-end read.
A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).