Module 3.1 The Human Genome Project Flashcards

1
Q

Human Genome Project

history

A
  • A large, international scientific effort that generated the first sequence of the human genome and that of selected model organisms
  • 20 centers from six countries: US, France, Germany, Japan, China, UK
  • Started in 1990, the project was predicted to last fifteen years, with estimated cost of $3B
  • initial plan was to finish sequencing the selected model organisms, including yeast, C. elegans, drosophila and mouse and human using automated sequencer
  • Celera Genomics, a private company, joined the
    race in 1998
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hierarchical Shotgun Sequencing
(BAC-to-BAC sequencing)

A
  1. human genome fragmented into large pieces
  2. large fragments sorted and organized into a physical map based on their relative positions in the genome
  3. subset of the individual genomic fragments that represent the genome with overlapping sequences are selected and sequenced by random shotgun sequencing strategy
  4. shotgun sequencing data is stitched back together to get the sequence of the large fragments.
  5. sequences are assembled to reconstruct the sequence of the entire genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BAC libraries of Human Genome

Calculation

A

BAC libraries: ~150,000bp/fragment
Coverage =
Genome size (G) / [Insert length (L) x Number of clones (N) ]
Number of clones (N) needed For 1x Coverage:

Using BAC libraries:
N=3x109 / 1.5x105 = 2 x 104
Using small plasmid libraries:
N=3x109 / 1.5x103=2 x 106

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Phred Software Package

A

assigns a base quality score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

base quality score

A

assesses the probability of an error
- makes it possible to monitor raw data quality and help in determining whether two similar sequences truly overlap.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

FRAP computer package

A

systematically assembles the sequencing data using base quality score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sanger sequencing high-quality read length

A

600-900 bases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

shotgun sequencing

history

A
  • first proposed in 1979 for sequencing genomes 4,000- 7,000 bp long
  • first genome sequenced was 8000 bp Cauliflower Mosaic Virus (1981) by sequencing 175 individual fragments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EcoR1 recognition site

A

G / AATTC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

HindIII recognition site

A

A / AGCTT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cauliflower Mosaic Virus Shotgun Sequencing

process

A
  1. Make large quantity of the virus
  2. isolate the DNA
  3. split DNA into multiple reactions
  4. in each reaction, you treat DNA samples with one restriction enzyme to get a specific set of fragments
  5. purify the restriction digested DNA fragments and sequence
  6. identify overlapping regions by looking for the same sequences in two fragments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Whole genome sequencing

main challenge

A
  • if genome has a lot of repeat sequences, it will be hard to identify overlapping regions with high accuracy
  • human genome is 3 billion bp long, more than 50% are repeated sequences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Plasmid cloning fragment limit

base pairs

A

1,000 - 30,000 base pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

hierarchical genome sequencing

Bacterial Artificial Chromosome
(BAC)
8

A
  • originally created from F’ plasmid.
  • able to hold up to 350 KB of DNA
  • origin of replication site (ori)
  • antibiotic resistance gene
  • restriction sites for DNA insertion
  • lacZ gene for blue/white colony selection
  • present in only one or two copies per cell so able to keep large fragment stable
  • Each colony contains particular piece of the genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

coverage

A

number of times a given nucleotide in a DNA molecule is represented in the library
- quantifies depth or redundancy of representation for a particular genomic region in library
- common QC matrix in genomic sequencing
- need to cover the genome more than one time with redundancy so that you can ensure the proper sequencing of the entire genome region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

hierarchical genome sequencing

BAC libraries for human genome

properties / preparation

A
  • genome fragmented into large pieces, with each piece about 150,000 base pair long
  • clone fragments into BAC vectors and generate many different colonies from transfected cells
  • BAC colony = Genomic DNA clone
  • pick colonies, grow cells, and preserve cells of each clone in a freezer.
  • give each BAC clone unique ID for checking
  • fragments contained in clones have different ends
17
Q

hierarchical genome sequencing

genomic DNA library

A

entire collection of BAC clones

18
Q

sequencing coverage

A

number of times a given position in the DNA is read or sequenced

19
Q

hierarchical genome sequencing

restriction fingerprinting

A
  • digest clone DNA fragment with restriction enzymes and analyze fragment size by gel electrophoresis
  • clones can be grouped into subsets, each member of which is related to at least one other member by a significant overlap, suggesting that subsets of clones within a group have a high likelihood of originating from a contiguous region of the DNA
20
Q

DNA fingerprinting

pairwise comparison

A
  • fragment size of each clone is measured by comparing to the markers
  • By comparing the fragment length from two clones, you can identify same-size fragments, indicating overlapping fragments between the two clones
  • Two clones are considered similar when they have many matching fragment sizes (aka overlapping fragments)
21
Q

hierarchical genome sequencing

restriction fingerprint

A

pattern of various-sized fragment gel bands created when DNA clone insert is digested by restriction enzymes

22
Q

contig

A

a set of DNA segments or sequences that overlap in a way that provides a connecting representation of a genomic region

  • clone version provides a physical map of a set of cloned segments of DNA across a genomic region
  • sequence version provides actual DNA sequence of a genomic region.
  • defined by the criteria that each member of a particular subset is related to at least one other member by a significant pairwise overlap within the group
23
Q

hierarchical genome sequencing

clone selection for shotgun sequencing

A
  • physical map provides information about the order and the relative positions of BACs along the chromosomes
  • clones are selected for sequencing to minimize overlap between adjacent clones.
  • clone’s restriction enzyme fragments must be shared with at least one of its neighbors on each side in the contig.
  • want to minimize the redundancy between the clones within the contig that you can use the minimum number of BAC clones to cover entire contig.
24
Q

shotgun sequencing of BAC clone

process

A
  1. DNA insert in the BAC clone released by using restriction enzyme digestion.
  2. 150kb-long insert randomly fragmented into 1,500bp pieces
  3. each fragment cloned into separate M13 plasmid vectors (M13 library)
  4. M13 plasmid will produce high copies of the insert sequences.
  5. DNA (including M13 plasmid vector with DNA insert) extracted from bacterial cells and subjected to Sanger sequencing
  6. anneal primer to M13 circular vector and read toward DNA insert to get a read of about 500 to 600 bases from one end of insert
  7. After sequencing many unique fragments from M13 library reads can be aligned by the sequence overlap between the reads
25
Q

whole shotgun sequencing

A

method used by Celera Genomics
- Bypass step of building a physical map first, go straight to sequencing genome
- faster and simpler process but more challenging to assemble genome
- Multiple copies of genome are randomly sheared into 2,000 or 10,000 bp pieces and inserted into plasmids for growing in bacteria
- purified plasmids are then subject to Sanger sequencing (pair sequencing)
- two sequences oriented in opposite directions and about the length of a fragment apart from each other were valuable in reconstructing sequence of original target fragments

26
Q

whole shotgun sequencing

pair sequencing

process

A
  • anneal primers to the flanking region on the plasmid vectors and then read toward insert to create paired-end reads each 500bp long (for both 2000 and 10,000bp plasmids)
  • mate pair: Read 1(500bp) + unknown(1000bp) + Read 2(500bp)
  • align all the read pairs together by sequencing overlap between pairs (both ends)
  • use computer aligning and compare reads together to piece together the sequence information to fill out missing sequence
27
Q

whole shotgun sequencing

mate pairs

A

sequence reads from the same clone fragments (Read 1 and Read 2)

28
Q

hierarchical shotgun sequencing

benefits and drawbacks

A

Benefits
- relies less heavily on computing power and computer algorithms. - fingerprinted BAC map made it possible to select clones for sequencing that would ensure comprehensive coverage of the genome and reduce sequencing redundancy.
- challenge of sequence assembly minimized by restricting random shotgun sequencing to individual clones.
- clone based map also enabled the identification of large repeated segments of the genome and simplified the assembly

Drawbacks
- slower than whole genome shotgun sequencing
- labor intensive