Module 3.1 The Human Genome Project Flashcards
Human Genome Project
history
- A large, international scientific effort that generated the first sequence of the human genome and that of selected model organisms
- 20 centers from six countries: US, France, Germany, Japan, China, UK
- Started in 1990, the project was predicted to last fifteen years, with estimated cost of $3B
- initial plan was to finish sequencing the selected model organisms, including yeast, C. elegans, drosophila and mouse and human using automated sequencer
- Celera Genomics, a private company, joined the
race in 1998
Hierarchical Shotgun Sequencing
(BAC-to-BAC sequencing)
- human genome fragmented into large pieces
- large fragments sorted and organized into a physical map based on their relative positions in the genome
- subset of the individual genomic fragments that represent the genome with overlapping sequences are selected and sequenced by random shotgun sequencing strategy
- shotgun sequencing data is stitched back together to get the sequence of the large fragments.
- sequences are assembled to reconstruct the sequence of the entire genome
BAC libraries of Human Genome
Calculation
BAC libraries: ~150,000bp/fragment
Coverage =
Genome size (G) / [Insert length (L) x Number of clones (N) ]
Number of clones (N) needed For 1x Coverage:
Using BAC libraries:
N=3x109 / 1.5x105 = 2 x 104
Using small plasmid libraries:
N=3x109 / 1.5x103=2 x 106
Phred Software Package
assigns a base quality score
base quality score
assesses the probability of an error
- makes it possible to monitor raw data quality and help in determining whether two similar sequences truly overlap.
FRAP computer package
systematically assembles the sequencing data using base quality score
Sanger sequencing high-quality read length
600-900 bases
shotgun sequencing
history
- first proposed in 1979 for sequencing genomes 4,000- 7,000 bp long
- first genome sequenced was 8000 bp Cauliflower Mosaic Virus (1981) by sequencing 175 individual fragments
EcoR1 recognition site
G / AATTC
HindIII recognition site
A / AGCTT
Cauliflower Mosaic Virus Shotgun Sequencing
process
- Make large quantity of the virus
- isolate the DNA
- split DNA into multiple reactions
- in each reaction, you treat DNA samples with one restriction enzyme to get a specific set of fragments
- purify the restriction digested DNA fragments and sequence
- identify overlapping regions by looking for the same sequences in two fragments
Whole genome sequencing
main challenge
- if genome has a lot of repeat sequences, it will be hard to identify overlapping regions with high accuracy
- human genome is 3 billion bp long, more than 50% are repeated sequences
Plasmid cloning fragment limit
base pairs
1,000 - 30,000 base pairs
hierarchical genome sequencing
Bacterial Artificial Chromosome
(BAC)
8
- originally created from F’ plasmid.
- able to hold up to 350 KB of DNA
- origin of replication site (ori)
- antibiotic resistance gene
- restriction sites for DNA insertion
- lacZ gene for blue/white colony selection
- present in only one or two copies per cell so able to keep large fragment stable
- Each colony contains particular piece of the genome
coverage
number of times a given nucleotide in a DNA molecule is represented in the library
- quantifies depth or redundancy of representation for a particular genomic region in library
- common QC matrix in genomic sequencing
- need to cover the genome more than one time with redundancy so that you can ensure the proper sequencing of the entire genome region