Chapter 3 Lec4 +5 Flashcards

Mapping, sequencing, annotation, and databases

1
Q

What are the two general approaches to genome sequencing and what happens in each approach. (7)

A
  1. Hierarchical method (aka ‘BAC-to-BAC’ method)
    - the whole genome 1st fragmented
    - and cloned into bacterial artificial chromosomes (BACs)
    - and establish the order of fragments before sequencing them
  2. The whole-genome shotgun (WGS) method
    - works directly with large numbers of smaller fragments
    - entails a concomitantly more challenging assembly problem.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define bacterial artificial chromosome(BAC).

A

Is a plasmid containing foreign DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discuss the approach to organizing the sequencing of large DNA molecule into pieces of known relative position. Provide with all necessary examples or features.

A
  • 1st, cut the DNA into fragments of about 150kb, clone them into BACs.

E.g. Arabidopsis thaliana has a haploid genome size of about 10^8 bp. A 3948 clone BAC library for A.thaliana contains ~100kb inserts per clone, giving approximately four-fold coverage.

  • Identify a series of clones with overlapping fragments. Termed or referred as DNA fingerprinting, and depends on shared(rather than unique) features of overlapping clones, including:
    1. Overlap of restriction fragment size patterns- RE digestion
    2. Amplification of single-copy DNA between interspaced repeat elements and checking for similar size patterns of fragments.
    3. Mapping sequence-tagged sites(STSs) and looking for fragments sharing STSs.
  • using the overlaps, order the clones according to their position along the original large target DNA molecule.
  • Subfragment each BAC clone, sequence the fragments and assembled them:
  • make the clones small enough so that the ~1500bp sequenced sub subfragments can be assembled to give the complete sequence of the ~150kb BAC clones,
  • then the clones can be assembled using their known order in the original sequence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the principle of the whole-genome shotgun approach?

A

Is to sequence random pieces of the(genomic)DNA and put them together in the right order.

-if this can be done then one can skip the laborious stage of creating a map as the basis for assembling partial sequences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what happens in the whole-genome shotgun sequencing of the Drosophila melanogaster(fruit fly) genome.

A
  1. DNA was sheared into random pieces of approximately 2kb,10kb, and 150kb.
  2. For each piece, sequences of ~500bp were determined from each end and were called “reads”
  3. A computer( in silico) assembled the sequences into maximal set of contiguous sequence, or contig.
    • ‘Golden Path’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define a ‘Golden path’.

A

Is the fully assembled genome sequence, but built by coalescence of contigs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define coverage and give the appropriate formula for it.

A

Is the average number of times each base appears in the (sequenced)fragments.

Coverage = NL/G
•N, number of reads
•L, length of a read
•G, genome length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what can be done to close the gaps, once you’ve assembled the contigs and identified the gaps?

A

Finishing, which involves synthesis and sequencing of specific fragments to close the gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Provide the two concerns of whole-genome shotgun(WGS) sequencing.

A

(1) WGS worked smoothly for prokaryotes, contain relatively less internal repetitive sequence.

  • repeats create problems in assembly;and led to scepticism about the feasibility of the shotgun approach for a complex eukaryotic genome.
  • D.melanogaster - fewer repeats than mammalian genomes and this contributed to it’s successful sequencing by shotgun methods

> publication of the Drosophila genome in the year 2000 contained 120 Mb of finished sequence, with about 1600 gaps.

> later, the no. Of gaps had been reduced to less than 1% of total sequence

(2)Genomes with highly skewed base composition also complicates application of WGS
E.g. Plasmodium falciparum- contains ~80mol% AT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Provide the two positives about WGS approach.

A

(1) it may be possible to identify genes in a partly assembled genome with many gaps,provided that the genes are contained within contigs.
(2) fruit fly WGS sequencing by Celera- ‘ proof of principle’ ; completed the ‘commercial’ human genome project using academic sequence as reference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name the two main differences between BAC-to-BAC and WGS approaches and explain them.

A

(1) BAC-to-BAC methods are more robust than WGS methods.
• in diploid, fragments arising from homologous regions of two chromosomes of a pair may have sequence differences.
• correct assembly must place them at the same location, while noting the discrepancies, thus, assembly must not split these reads into different contigs because of the imperfect matches(BAC ordering)

(2) Highly inbred laboratory strain vs outbred population or pooled DNA
• would present a more severe assembly challenge (in light of point above)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

State the common and different steps in ‘BAC-to-BAC’ and WGS methods.

A
  1. Make random cuts to produce fragments of(150 kb in ‘BAC-to-BAC ‘) and (2000 kb and 10000 kb in WGS)
  2. Clone fragments to create BAC library and plasmid library in WGS.
  3. Fingerprint, overlap and order BAC clones, skip this step for WGS.
  4. Subclone into plasmids and partially sequence 1.5kb Subfragments of individual subclones

WGS- partially sequence 1.5kb subfragments of individual plasmid clones.

  1. Both assemble overlaps by computer(silico)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define single-end read

A

A technique in which sequence is reported from only end of a fragment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define paired-end read.

A

A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define paired-end read.

A

A technique in which sequence is reported from both ends of a fragment/template (with a number of undetermined bases between the reads that is known only approximately).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define read length

A

The no. Of bases reported/sequenced from a single experiment on a single fragment/ template.

16
Q

Define assembly/sequence

A

The inference of the complete sequence of a region from the data on individual fragments from the region, by piecing together overlaps.

17
Q

Define contig

A

A partial assembly of data from overlapping fragments into a contiguous region of sequence.

18
Q

Define de novo sequencing

A

Determination of a full-genome sequence without using a known reference sequence from an individual of the species to avoid the assembly step.

19
Q

Define resequencing

A

Determination of the sequence of an individual of a species for which a reference genome sequence is known. The assembly process is replaced by mapping process is replaced by mapping the fragments onto the reference genome.

20
Q

Define exome sequencing

A

Targeted sequencing of regions in DNA that code for parts of expressed proteins (exons). This method targets only the approximately 180000 exons in the human genome, for example.

21
Q

Define RNAseq

A

Sequencing the contents and composition of the RNAs in the cell(called the ‘transcriptome’) by conversion of RNA to complementary DNA and sequencing the results.

22
Q

What are the general approaches to improving the throughput/cost ratio?

A

Miniaturization and parallelization or multiplexing

23
Q

Provide the common preparation steps in NGS in high-throughput DNA sequencing.

A

(1) target DNA is fragmented
(2) common adaptors are attached to one or both ends
(3) amplification (via PCR) - generates a library of short regions
(3) spatial distribution of library-either in an array of wells or fixed to a solid medium- and sequencing in parallel. ‘De novo’ sequence assembly or mapping to reference genome

24
Q

Name the different sequencing platforms.

A

(1) Roche 454 life sciences
(2) ion torrent/personal genome machine (PGM)
(3) Oxford Nanopore
(4) the bionano Irys system
(5) Illumina(solexa)
(6) PacBio
(7) 10X Genomics