The minimal genome Flashcards
What is the concept of the minimal genome?
A hypothetical minimal auto-replicative system
- Aims to strip down a present day bacterium to its minimum essential components pertaining to replication, transcription and translation machinery.
- Understand the basic components of the cell that makes it living.
- Provides a template genome that can be used to recreate life
- A less complex cell that can be reliably modeled and engineered to meet our requirements (aka Synthetic Biology, Synthetic Genomics).
- until recently the search for the minimal gene set was the domain of computer- aided comparative genomics
- the creation of an artificial cell with a minimal genome is far from being straightforward (even though of G. Venter’s “synthetic cell” approach)
- gene sets of completely sequenced organisms is:
- ~ 500 to 10‘000 genes (Prokaryotes)
- ~ 2’000 to 30’000 genes (Eukaryotes)
- the smallest genome of a free-living organism is the bacteria Pelagibacter ubique with 1’354 protein-coding genes & 35 ncRNA-coding genes (1.3 Mb)
→ minimal gene set that is needed for autonomous cellular life is surprisingly small
(parasitic & symbiotic bacteria have even smaller genomes!
What is the classical view on genome evolution in prokaryotes and how is it more realistically?
Classical view on genome evolution:
point mutations accumulate slowly
Bacterial genomes are much more dynamic and can change rapidly:
- rapid loss of large DNA fragments (via homologous recombination)
- integration of horizontally acquired DNA (lateral gene transfer)
- genome rearrangements (transposons, insertion segments, inversions):
- e.g. Inversion via homologous recombination
- Gene duplication:
- e.g. due to “illegitimate crossing over” during replication
- etc
Why is it not enough to sequence only a single strain of a species in prokaryotes?
- Bacterial strains belonging to the same species vary considerably in gene content (e.g. different E. coli strains).
- thus sequencing a single strain of a species only reveals an incomplete picture of the species gene content
- The genetic repertoire of a given species (its “pan-genome”) is much larger than the gene content of individual strains
- Example: E. coli
- Pan-genome: 10’131 - 20’000 genes
- Average E. coli genome: 4’721 genes
- Core genome: 2’167 genes
What is a pan-genome?
The full complement of genes found in all strains of a species
- the pan-genome increases the more sequenced genomes are considered
- codes usually for functions of the cell surface, signal transduction, or pathogenicity
→ thus for functions crucial to conquer rapidly changing environments or niches
What is the core-genome?
The gene set found in all genomes of a species
- the more divisions/species/strains are compared, the smaller the “core genome” gets
- these genes are more „stable“ and code for translation and amino acid synthesis
What are genomic islands?
- relatively large DNA regions (10 – >100 kb) can integrate into genomes (e.g.: pathogenicity islands)
- by that several genes get integrated at once (via HGT)
- often associated with integrase genes & they have an unusual nucleotide composition (such as GC content or codon usage → hallmarks for foreign origin)
→ primary strategy for gene acquisition in prokaryotes
What are Borgs?
- new type of archaeal extrachromosomal element discovered in a metagenomics study
- extraordinarily large (up to 1 Mbp) DNA
- linear extrachromosomal DNA that carry repetitive sequences at the ends and throughout
- their genes were assimilated from methane-oxidizing Methanoperedens archaea
What is a bacterial IS elemnt?
Insertion Sequence elements
- Central region encodes 1 or 2 enzymes required for transposition (transposase)
- It is flanked by inverted repeats of characteristic sequence
- The 5’ and 3’ short direct repeats are generated at the target-site DNA during the insertion
- Transposition is catalyzed by one or both of the flanking IS-encoded transposases
-
Example: Sulfolobus sp.
- have enormous single-gene- translocation rate
- → Sulfolobus known for its abundant transposons (Insertion Sequence elements)
- ~12% of the 3 Mb genome are IS-elements
- a correlation exists between the number of repetitive sequences (e.g. IS-elements) and the rate of single gene translocation
- thus transposons and other repetitive elements are hot-spots for genomic rearrangements
→ play a key role in genome evolution
What is Pelagibacter ubique?
One of the most abundant microbes in the ocean
- 0.37 – 0.89 μm long & 0.12 – 0.20 μm in diameter
- 30% of the cell’s volume is taken up by its genome
- the smallest genome (1.3 Mb) of any free living organism
- no duplicated gene copies, no viral genes, and little ncDNA
- genome has been streamlined
- low GC content of only 30%
What is Mycoplasma genitalium?
- 0.58 Mb genome
- 470 protein-coding genes
- intracellular parasite in humans
- smallest known genome of an organism capable of independent growth
- lacks a cell wall
- Mycoplasma sp. originate from a Gram positive ancestor via reductive evolution
- evolved by massive genome reduction
- obligate parasites
- lack genomic redundancy
→ Appears to be ideally suited to look for essential genes
What is Nanoarchaeum equitans?
- 0.49 Mb genome; 522 protein-coding genes
- exists only in co-culture with Ignicoccus sp. (parasite or symbiont?)
- only 400 nm diameter (smaller than some viruses)
- so far the only representative of the phylum Nanoarchaeota
- lacks genes for the synthesis of amino acids, nucleotides and lipids
- carries ‘split genes‘: → ancestral gene structure of multi domain proteins/RNAs?
- living fossil
How can comparative genomics help with finding the minimal genome?
- Exaple:
- comparative genomics: E. coli uses 243 genes for energy metabolism, while H. influenza only 112, and M. genitalium only 31
- the parasitic M. genitalium can even eliminate essentially all genes of amino acid synthesis pathways
- genes for translation, however, cannot be reduced
- also protein folding genes are resistant to elimination
- comparative genomics: under optimal growth conditions (all essential nutrients present, no stress, no competition) the “minimal genome” consists of ~ 256 genes
→ In fact it is more appropriate to speak of a minimal set of essential functional niches rather than of minimal sets of genes.
- comparison of ~100 genomes revealed that only ~63 genes (!) were not replaced by NOGD and were thus present in all of the investigated genomes
- > 50 thereof are components of translation machinery
What is Non-orthologous gene displacement?
Displacement of a gene responsible for a particular biological function in a certain set of species by a non-orthologous (unrelated) gene in a different set of species.
What experimental evidence is there regarding the minimal genome?
- Genome-wide analyses of gene knock-outs via transposon insertion, plasmid insertion or antisense RNA
- in these studies > 50% of the genes of the respective organism have been studied
- the experimental approaches revealed a surprisingly similar number for the minimal gene set as comparative genomics did, namely ~300-350 genes
- reason why H. infuenzae and E.coli require more genes unclear; maybe Gram neg. need more genes to build their cell wall, and for transport through it
- Used species: M. genitalium/M. pneumoniae, B. subtilis, H. influenzae, E. coli, S. cerevisiae, C. elegans
- not only the number, also the function of these „minimal gene set“ genes were similar between the experimental (right) and computational (left) studies
- minimal gene set: many genes for information-processing (especially translation), only very few genes with unknown functions
What are problems when defining the minimal gene set?
Essential Genes – A conclusive list??
- Different studies came up with a different number of essential genes.
- Computation: Underestimates minimal genes - accounts mainly those genes that have been conserved in evolution.
- Transposon mutagenesis: Overestimates the genes – Classifies genes that slow down growth as essential and essential genes that tolerate mutation as non-essential.
- Most mutants produced are single mutants – synthetic lethality largely not considered
Construction of a single cell with systematic combination of all the mutations in a single strain is beyond the scope of present day technology
What is an essential gene?