Week 4 - Bacterial Genomics Flashcards

1
Q

Genome

A
  • entire complement of genetic information

* includes genes, regulatory sequences, and noncoding DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Genomics

A

discipline of mapping, sequencing, analyzing, and comparing genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Number of prokaryotic genomes sequenced

A

over 12,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RNA virus MS2

A
  • first genome sequenced in 1976

* 3,569 bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Haemophilus influenzae

A
  • first cellular genome sequenced in 1995

* 1,830,137 bp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Large-scale sequencing projects have led to automated DNA sequencing systems

A
  • based on Sanger method

* radioactivity replaced by fluorescent dye

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sequencing

A

determines the order of nucleotides in a DNA or RNA molecule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sanger dideoxy method

A
  • invented by Fred Sanger (Nobel Prize winner)
  • 2 sequencing techniques were developed independently in the 1970s. The method developed by Fred Sanger used chemically altered “dideoxy” bases to terminate newly synthesized DNA fragments at specific bases (either A, C, T, or G)
  • these fragments can then be size-separated , and the DNA sequence can be read
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purines

A

adenine
guanine
• two rings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pyrimidines

A

cytosine
uracil
thymine
• one ring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Determining the sequence of DNA

A
1. chain termination or dideoxy method
(F. Sanger0
2. shotgun sequence method
3. second generation sequence methods
(pyrosequencing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Dideoxy (Sanger) method - steps

A
  1. denaturation
  2. primer attachment and extension of bases
  3. termination
  4. gel electrophoresis
    produces chromatograph - laser detectioin of fluorchromes and computational sequence analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sanger reaction mixture

A
  • primer and DNA template
  • ddNTPs with flourchromes
  • DNA polymerase
  • dNTPs (dATP, dCTP, dGTP, dTTP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s wrong with the Sanger/dideoxy method?

A
  • only good for 500-750bp reactions
  • expensive
  • takes time
  • the human genome is about 3 million bp
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Shotgun sequencing

A

used to sequence whole genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Steps of shotgun sequencing

A
  1. DNA is randomly broken up into smaller fragments
  2. dideoxy method produces reads
  3. look for overlap of reads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Whole genome shotgun sequencing

A

• in whole genome shotgun sequencing the entire genome is sheared randomly into small fragments (appropriately sized for sequencing) and then reassembled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Hierarchical shotgun sequencing

A
  • the genome is first broken into larger segments
  • after the order of these segments is deduced, they are further sheared into fragments appropriately sized for sequencing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Pyrosequencing

A
  • each nucleotide is added in turn
  • only 1 of 4 will generate a light signal
  • the remaining nucleotides are removed enzymatically
  • the light signal is recorded on a pyrogram
  • sequencing by synthesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Advantages of pyrosequencing

A
  • accurate
  • parallel processing
  • easily automated
  • eliminates the need for labeled primers and nucleotides
  • no need for gel electorphoresis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Basic idea of pyrosequencing

A
  • visible light is generated and is proportional to the number of incorporated nucleotides
  • 1 pmol DNA = 6e11 ATP = 6e9 photons at 560nm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Pyrosequencing - 1st method

A

solid phase
• immobilized DNA
• 3 enzymes
• wash step to remove nucleotides after each addition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pyrosequencing - 2nd method

A

liquid phase
• 3 enzymes + apyrase (nucleotide degradation enzyme)
(eliminates need for washing step)
• in the will of a microtiter plate: primed DNA template and 4 enzymes
• nucleotides are added stepwise
• nucleotide-degrading enzymes degrade previous nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Pyrosequencing disadvantages

A
  • smaller sequences

* nonlinear light response after more than 5-6 identical nucleotides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

454 sequencing system

A

• recent technological advance
• generates data 100x faster than Sanger method
• 454 relies on 2 major advances
- massively parallel liquid handling and pyrosequencing
– light is released each time a base is added to DNA strand
– instrument actually measures releaes of light
– can only handle short stretches of DNA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Virtually all genomic sequencing projects use

A

shotgun sequencing
• entire genome is cloned and resultant clones are sequenced
• much of the sequencing is redundant
• generally 7- to 10-fold coverage

  • computer algorithms used to look for replicate sequences and assemble them
  • occasionally assembly isn’t possible
  • closure can be pursued using PCR to target areas of the genome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Closed vs Draft genome

A
  • closed genome relies on manpower
  • more expensive
  • more information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Annotation

A

converting raw sequence data into a list of genes present in the genome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Majority of genes encode

A

proteins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Functional ORF

A

an open reading frame that encodes a protein
• computer algorithms used to search for ORFs
- look up start/stop codons and Shine-Delgaro sequences
• ORFs can be compared to ORFs in other genomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Inaccuracies in some annotations are problematic

A

as many as10% of annotated genes are incorrectly annotated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Dideoxy method summary

A
  • chain termination method

* best for small DNA segments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Whole genome shotgun sequencing summary

A
  • sequence human genome

* fragments larger DNA strand to make manageable chunks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Pyrosequencing summary

A
  • sequence by synthesis

* accurate and fast

35
Q

Bioinformatics

A
  • science that applies powerful computational tools to DNA and protein sequences
  • for the purpose of analyzing, storing, and accessing the sequences for comparative purposes
36
Q

Correlation between genome size and ORFs

A

• on average a prokaryotic gene is 1,000 bp long
- ~ 1,000 genes per megabase
(1Mbp = 1,000,000 bp)
- as genome size increases, gene content proportionally increases

37
Q

First complete bacterial genome sequenced in

A

1995

• now routine and many hundreds of bacterial genomes have been sequenced

38
Q

“Traditional” sequencing methods are now supplemented by

A
  • “environmental genome sequencing” - sequence DNA from an environmental sample, without isolating and culturing strains first
  • “RNA sequencing” - “deep sequencing” of RNA to reveal the frequency of different RNA molecules
39
Q

Smallest cellular genomes belong to

A

parasitic or endosymbiotic prokaryotes
• obligate parasites range from 490kbp (Nanoarchaeum equitans) or 4,400 kbp (Mycobacterium tuberculosis)
• endosymbionts can be smaller (eg 160 bp genome of Carsonella ruddii)
• estimates suggest that the minimum number of genes fora viable cell is 250-300 genes

40
Q

Obligate parasites (genome)

A

from 490 kbp (Nanoarchaeum equitans)

to 4,400 kbp (Mycobacterium tuberculosis)

41
Q

Endosymbionts (genome)

A

can be smaller

eg 160 bp genome of Carsonella ruddii

42
Q

Estimates suggest the minimum number of genes for a viable cell is

A

250-300 genes

43
Q

Largest prokaryotic genomes are comparable to those of some eukaryotes

A
Sorangium cellulosum (bacteria)
• largest prokaryotic genome to date is 12.3 Mbp

largest archaeal genomes tend to be smaller (~5 Mbp)

44
Q

Complement of genes in a particular organism defines its biology, but genomes are also molded by

A

an organisms lifestyle

45
Q

Many genes can be identified by

A

sequence similarity to genes found in other organisms (comparative analysis)

46
Q

Comparative analyses allow for

A

predictions of metabolic pathways and transport systems

• eg Thermotoga maritima

47
Q

Escheria coli

A
  • 4.6 MB

* 4405 genes

48
Q

Streptomyces coelicolor

A
  • 8.7 MB

* 7825 genes

49
Q

Mycoplasma genitalium

A
  • 0.58 MB

* 482 genes

50
Q

Methanococcus jannaschii

A
  • 1.66 MB

* 1738 genes

51
Q

• Prochlorococcus marinus

A
  • 1.67 MB

* 1696 genes

52
Q

Aabaena cylindrica

A
  • 6.36 MB

* 6132 genes

53
Q

In addition to the main chromosome, many bacteria also have

A

stable plasmids - much smaller circular DNA molecules, usually with a few genes

54
Q

Range of genome sizes

A
  • Mycoplasma genitalium - 0.58 MB
  • Streptomyces coelicolor - .8 MB
  • Escheria coli is fairly average - 4.60 MB with circular chromosome about 1.4mm in circumference, 1.44mm long, diameter of 0,45 mm (E. coli cell 4micrometers long)
55
Q

E. coli normally has a single copy

A

of its chromosome per cell - or 2 copies when the cell is about to divide

56
Q

Some bacteria have

A

multiple copies of the chromosome
• eg cyanobacteria typically have about 10 copies of the chromosome in every cell
• eg a Synechocystis cell is about 3 micromenters in diameter and each cell contains DNA with a total length of about 11mm

57
Q

Bacterial DNA is

A

tightly folded and packed into an irregular structure in the cytoplasm - the nucleoid

58
Q

The nucleoid

A
  • by weight about 60% DNA, 30% RNA, 10% protein
  • RNA and proteins probably help to fold DNA into a compact structure
  • with very rare exceptions, no surrounding membrane - in bacteria DNA is freely exposed to the cytoplasm
  • BUT the nucleoid is usually attached to the plasma membrane at one point
59
Q

DNA replication

A

• starts from a single, defied origin
• is bidirectional
(origin of replication, replication forks (2, theta), 2 new double-stranded circular DNA molecules)

60
Q

In eukaryotes, replication is initiated at

A

multiple loci along the chromosome

61
Q

DNA replication in bacteria can

A

only start at one point

62
Q

DNA replication in bacteria takes a minimum of about

A

30 minutes for replication to be complete (depending on the genome size)
• BUT the mean doubling time for some bacteria is less than this, under optimal conditions - how?

63
Q

Sequences beginning with a START codon followed by a long run of codons before he first STOP codon

A

are very unlikely to occur by chance

• such a sequence is known as an ORF and is potentially a sequence coding for a protein (a gene)

64
Q

Start codons

A
  • ATG

* GTG

65
Q

Stop codons

A
  • TAA
  • TAG
  • TGA
66
Q

The cell recognizes genes in a different way to just Start and Stop codons

A
  • control sequences upstream of the ORF promote binding of RNA polymerase
  • hence transcription to RNA followed by translation of the RNA to make protein
  • but those control sequences are very hard for us to recognize
67
Q

Structure of a typical bacterial gene

A
5'
• regulatory sequences
• RNA polymerase binding
• leader sequence (RNA - ribosome binding)
• Coding region ORF (RNA - coding region ORF)
• trailer (RNA - trailer)
terminator
3'
68
Q

Total predicted ORFs in Synechocystis

A

3186 ORFs predicted in total

• genes can be on either strand of the DNA

69
Q

Some lessons learned from bacterial genome sequencing

A
  1. numbers of genes, relationship to complexity of the organism
  2. a possible minimum set of genes - idenify the common minimal set of genes needed for viability?
  3. dense packing of genes in bacterial chromosomes
  4. organization of genes in operons
  5. evolutionary diversity
  6. evolutionary relationships
  7. large number of unknown genes (40-60%)
70
Q

Some lessons learned from bacterial genome sequencing

1. number of genes, relationship to complexity of the organism

A

rough correspondence between genome size and complexity of lifestyle
• Mycoplasma genitalium (0.58 MB, 482 genes) - parasite with very small cells and simple metabolism
• Streptomyces coelicolor (8.7 MB, 7825 genes) - soil bacterium with very versatile metabolism, complex structure (branched network of filaments), sporulation
• Prochlorococcus marinus and Anabaen cylindrica are both cyanobacteria
- Prochlorococcus (1.67 MB, 1696 genes) - has small, simple cells
- Anabaena (6.37 MB, 6132 genes) - filamentous, multiple cell types

71
Q

Mycoplasma genitalium (0.58 MB, 482 genes)

A

parasite with very small cells and simple metabolism

72
Q

• Streptomyces coelicolor (8.7 MB, 7825 genes) - soil bacterium with very versatile metabolism, complex structure (branched network of filaments), sporulation

A

soil bacterium with very versatile metabolism, complex structure (branched network of filaments), sporulation

73
Q

Prochlorococcus marinus and Anabaen cylindrica are both

A

cyanobacteria

  • Prochlorococcus (1.67 MB, 1696 genes) - has small, simple cells
  • Anabaena (6.37 MB, 6132 genes) - filamentous, multiple cell types
74
Q

Some lessons from bacterial genome sequencing

2. a possible minimum set of genes - identify the common minimal set of genes needed for viability?

A

• Craig Venter’s plan to further strip down the genome of Mycoplasma genitalium to create a minimum living organism of about 300 genes

75
Q

Some lessons from bacterial genome sequencing

3. Dense packing of genes in the bacterial chromosome

A

bacteria typically about 1 gene per 1,100 bases in H. sapiens about 1 gene per 30,000 bases
• bacteria have dense clustering of genes - very different from eukaryotes

76
Q

Bacteria typically have about 1 gene per

A

1,100 bases

77
Q

Homo sapiens have about 1 gene per

A

30,000 bases

78
Q

Some lessons from bacterial genome sequencing

4. organization of genes in operons

A
  • clusters of genes on the same DNA strand with related functions likely to be operons
  • genes are co-transcribed (ie 1 mRNA molecule for the whole operon)
79
Q

Some lessons learned from bacterial genome sequencing
5. evolutionary diversity of prokaryotes
why are bacterial genes and genomes so diverse?

A

probably 2 reasons

a. bacterial metabolic diversity - different bacterial species may have fundamentally different metabolism, hence the need for quite different sets of genes
b. deep evolutionary roots - bacteria have been on the planet much longer than other life forms - hence greater time for evolutionary divergence

80
Q

Some lessons learned from bacterial genome sequencing

6. evolutionary relationships

A

comparing related species of pathogenic bacteria - can we track pathogen evolution, and can we identify specific genes that are important for specific pathogenecities?
• Mycobacterium bovis - bovine tuberculosis - 3952 genes
• Mycobacterium tuberculosis - human tuberculosis - 4238 genes
• Mycobacterium leprae - leprosy - 2768 genes

• classical microbiology shows different host range, virulence, and physiology - but what is the genetic basis of the differences? and how are the 2 species related? did M. bovis jump the species barrier from cattle to humans when cattle were domesticated 10,000 - 15,000 years ago?

81
Q
6. Evolutionary relationships
both organisms (M. bovis and M. tuberculosis) now completely sequenced - what does comparison of the genomes tell us?
A
  • very closely related >99.95% sequence identity
  • nearly all ORFs are conserved, and are in the same order on the chromosome - no rearrangements
  • therefore recent divergence
  • but M. bovis has a slightly smaller genome, and a series of deletions resulting in about 300 fewer genes. It looks as though M. tuberculosis is closer to the common ancestor - did cows catch TB from us?
82
Q

Some lessons from bacterial genome sequencing

7. large number of unknown genes (typically 40-60%)

A

so one of the main lessons from genome sequencing is how much we don’t known about bacterial biology

83
Q

Summary

A
  • genetic information is stored int he order or sequence of nucleotides in DNA
  • chain termination sequencing is the standard method for the determination of nucleotide sequence
  • dideoxy-chain termination sequencing has been facilitated by the development of cycle sequencing and the use of fluorescent dye detection
  • alternative methods are used for special applications, such as pyrosequencing (for resequencing and polymorphism detection) or bisulfite sequencing (to analyze methylated DNA)