Week 4 - Bacterial Genomics Flashcards
Genome
- entire complement of genetic information
* includes genes, regulatory sequences, and noncoding DNA
Genomics
discipline of mapping, sequencing, analyzing, and comparing genomes
Number of prokaryotic genomes sequenced
over 12,000
RNA virus MS2
- first genome sequenced in 1976
* 3,569 bp
Haemophilus influenzae
- first cellular genome sequenced in 1995
* 1,830,137 bp
Large-scale sequencing projects have led to automated DNA sequencing systems
- based on Sanger method
* radioactivity replaced by fluorescent dye
Sequencing
determines the order of nucleotides in a DNA or RNA molecule
Sanger dideoxy method
- invented by Fred Sanger (Nobel Prize winner)
- 2 sequencing techniques were developed independently in the 1970s. The method developed by Fred Sanger used chemically altered “dideoxy” bases to terminate newly synthesized DNA fragments at specific bases (either A, C, T, or G)
- these fragments can then be size-separated , and the DNA sequence can be read
Purines
adenine
guanine
• two rings
Pyrimidines
cytosine
uracil
thymine
• one ring
Determining the sequence of DNA
1. chain termination or dideoxy method (F. Sanger0 2. shotgun sequence method 3. second generation sequence methods (pyrosequencing)
Dideoxy (Sanger) method - steps
- denaturation
- primer attachment and extension of bases
- termination
- gel electrophoresis
produces chromatograph - laser detectioin of fluorchromes and computational sequence analysis
Sanger reaction mixture
- primer and DNA template
- ddNTPs with flourchromes
- DNA polymerase
- dNTPs (dATP, dCTP, dGTP, dTTP)
What’s wrong with the Sanger/dideoxy method?
- only good for 500-750bp reactions
- expensive
- takes time
- the human genome is about 3 million bp
Shotgun sequencing
used to sequence whole genomes
Steps of shotgun sequencing
- DNA is randomly broken up into smaller fragments
- dideoxy method produces reads
- look for overlap of reads
Whole genome shotgun sequencing
• in whole genome shotgun sequencing the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled
Hierarchical shotgun sequencing
- the genome is first broken into larger segments
- after the order of these segments is deduced, they are further sheared into fragments appropriately sized for sequencing
Pyrosequencing
- each nucleotide is added in turn
- only 1 of 4 will generate a light signal
- the remaining nucleotides are removed enzymatically
- the light signal is recorded on a pyrogram
- sequencing by synthesis
Advantages of pyrosequencing
- accurate
- parallel processing
- easily automated
- eliminates the need for labeled primers and nucleotides
- no need for gel electorphoresis
Basic idea of pyrosequencing
- visible light is generated and is proportional to the number of incorporated nucleotides
- 1 pmol DNA = 6e11 ATP = 6e9 photons at 560nm
Pyrosequencing - 1st method
solid phase
• immobilized DNA
• 3 enzymes
• wash step to remove nucleotides after each addition
Pyrosequencing - 2nd method
liquid phase
• 3 enzymes + apyrase (nucleotide degradation enzyme)
(eliminates need for washing step)
• in the will of a microtiter plate: primed DNA template and 4 enzymes
• nucleotides are added stepwise
• nucleotide-degrading enzymes degrade previous nucleotides
Pyrosequencing disadvantages
- smaller sequences
* nonlinear light response after more than 5-6 identical nucleotides
454 sequencing system
• recent technological advance
• generates data 100x faster than Sanger method
• 454 relies on 2 major advances
- massively parallel liquid handling and pyrosequencing
– light is released each time a base is added to DNA strand
– instrument actually measures releaes of light
– can only handle short stretches of DNA
Virtually all genomic sequencing projects use
shotgun sequencing
• entire genome is cloned and resultant clones are sequenced
• much of the sequencing is redundant
• generally 7- to 10-fold coverage
- computer algorithms used to look for replicate sequences and assemble them
- occasionally assembly isn’t possible
- closure can be pursued using PCR to target areas of the genome
Closed vs Draft genome
- closed genome relies on manpower
- more expensive
- more information
Annotation
converting raw sequence data into a list of genes present in the genome
Majority of genes encode
proteins
Functional ORF
an open reading frame that encodes a protein
• computer algorithms used to search for ORFs
- look up start/stop codons and Shine-Delgaro sequences
• ORFs can be compared to ORFs in other genomes
Inaccuracies in some annotations are problematic
as many as10% of annotated genes are incorrectly annotated
Dideoxy method summary
- chain termination method
* best for small DNA segments
Whole genome shotgun sequencing summary
- sequence human genome
* fragments larger DNA strand to make manageable chunks