MODULE 1 - Microbial Evolution and Ecology Flashcards
what were the first organisms on earth and how did they create the environment which allowed for new forms of life?
anaerobic bacteria because there was no oxygen on earth. When these anaerobes developed the ability to use light to produce oxygen earth began to change from an anaerobic to an aerobic environment. Because of this aerobic organisms began to evolve and proliferate
What is LUCA and what can we assume because of it?
LUCA is the last universal common ancestor, and we can assume that if all life is descended from this, then there will be common biochemistry, architecture and mechanisms between all life e.g. storage of info in DNA, most cells work the same way, most life is made up of proteins etc.
what are the major components of all cells and what are these components made of?
the major components of all cells are membranes, nucleic acids (DNA/RNA), proteins and all these are composed of the same basic materials (CHONSP)
what is CHONSP and why are they the most common elements?
Carbon, Hydrogen, Oxygen, Nitrogen, Sulfur, Phosphorous
Because they form covalent bonds which are strong enough to hold long-term structure but can be broken down so that we can recycle our building blocks and they can also attach to multiple accessories
what is the universal solvent which is required for a chemical reaction to take place?
water hence its importance to all organisms
what are the two essential components for life to occur?
CHONSP and water
what was the Miller-Urey experiment?
simulated life with all the key elements and the right conditions inside a chemostat to make more complex structures proving that its possible for life to form from simple compounds. The experiment succeeded in creating many organic molecules, most essential amino acids and most nucleic acid bases. They concluded that the organic building blocks of life were generated in the probable atmosphere of early earth
why were membranes essential for the development of life?
they enclose and compartmentalise things in environments allowing certain experiments to occur shit loads of times so that a one in a million thing will likely occur in one of a million closed environments where the conditions are perfect
they also allow for gradients to let certain things in and out across the membrane
how were the first membranes formed?
they were likely self-assembled and would have been things like coacervates, micelles and liposomes. These would have been semi-permeable membranes and would persist indefinitely without oxygen or decomposers. These became what is called a protocell (closed environment formed by a membrane)
why might RNA have been the first biological molecule?
central dogma of molecular biology in the modern world suggests info flows from DNA to RNA to protein, however this might not have been the case for the first life forms. DNA is missing an oxygen atom making it more stable and less reactive in an aerobic world. But when the world was anaerobic (when life started) stability wasn’t an issue as there was no oxygen so perhaps RNA was the first molecule as we wouldn’t have had to worry about protecting it from oxygen
what are self-catalytic RNA enzymes?
ribozymes work by folding naturally onto themselves and recognising sequences then catalysing a reaction to cleave themselves
why are self-catalytic RNA enzymes/ribozymes the first step in producing life?
because it is the first time we are able to catalyse a reaction ourselves without letting chance rule it i.e we have reproducibility where the same things can happen over and over and we can expect the same outcome
what is the RNA world hypothesis for the origin of life
heating and cooling of the prebiotic soup leads to formation of more complex structure and eventually generates RNA and liposomes
liposomes and RNA combine with some other shit in the soup to form probionts (predecessor to early life)
so far these are all random reactions but once we have a ribozyme these reactions are reproducible allowing for production of RNA and proteins which allows for the first cells
so the RNA world hypothesis is generally centred around the idea that RNA was the first nucleic acid and was the building block which allowed life to occur
outline how ribozymes are formed
random precursors turn into more complex things like nucleotides which can randomly turn into RNAs which can even more rarely turn into ribozymes
once the world transitioned from a chemical world to a biological one, why did the biological one take over?
once you have life you have reproducibility meaning life can take over as it has a blueprint to allow enzymes to produce nucleotides in the same way
what is the selfish gene hypothesis?
all of life exists because our genes want to survive and so make copies of themselves and everything just came about as a side-effect of this
outline the RNA world hypothesis step by step
RNA forms from inorganic substances (proven by miller-urey)
RNA self-replicates via ribozymes
RNA catalyses protein synthesis
Membrane formation changes internal chemistry allowing new functionality
RNA codes both DNA and protein (DNA becomes master template and proteins catalyse cellular activity)
what was the first organism and where did it live?
it must have been prokaryotic, anaerobic and chemolithotrophic
it possibly consumed iron sulfide and hydrogen sulfide and could have used the resulting hydrogen to drive ATPase by splitting it
why is there so much diversity among microorganisms?
gradients, niches and speciation create lots of different habitats
how do microbes create gradients as they grow and what do these gradients allow for?
anaerobes consuming oxygen create areas with less oxygen where anaerobic or fermenting microorganisms may live (oxygen gradient)
microbes create nutrient gradients by consuming certain nutrients resulting in areas with more or less nutrients
microbes create pH gradients through creating waste products or chemical production through quorum sensing
all these gradients allow for diversity
what experimental evidence is there for evolution?
a single E. coli was inoculated and grown on a glucose limited media so as to create an environment which forces competition so that organisms change for an advantage. Through this they managed to produce three different strains from wildtype, each with differences in maximum specific growth rates and in glucose uptake kinetics
wild type could convert glucose to acetate to glycerol
in the new system the three strains of the same organism specialised in consuming either glucose, acetate or glycerol in order to reduce competition
how do we measure diversity in microbial communities in order to classify organisms?
taxonomy
function
metabolism
what classification systems do we have for microbial diversity?
biological
phenetic
cladistic (phylogenetic)
explain the biological classification system
organisms grouped based on ability to breed so this system doesn’t work for microbes since they don’t need a partner
explain the phenetic classification system
organisms grouped based on overall physical similarity (analogous) with no account of evolutionary history (i.e. only measures the end product)
this system is liable to errors due to convergent evolution
it also is ineffective at classifying microbes as they all very similar physically
what is convergent evolution?
leads to the same phenotypes with no shared recent ancestry e.g. bats, birds and insects all have wings
explain the cladistic (phylogenetic) classification system
most commonly used system
grouping organisms based on evolution from a shared ancestor (clade) as determined from a shared trait. Involves making trees from genetic sequences where more similarity between genomes means two organisms are more closely related. The more mutations accumulated indicates two organisms are more distantly related
the issue with this method is that its liable to ignore useful descriptive traits by being too focused on one evolutionary trait or gene
what is a molecular clock and what is the most commonly used molecular clock?
a gene whose DNA sequence can be used as a comparative temporal measure of evolutionary divergence. We are essentially trying to look at mutations in that gene to see how long the species have been separated from each other. This works because there is usually a linear relationship between time and the number of mutations accumulated
most commonly this is the 16s rRNA gene which encodes the RNA sequences within the small subunit of the ribosome because it is universally conserved in every living organism
what are the four key properties of a molecular clock?
found in all living organisms, maintains its function amongst all living organisms (because you want it to be under the same selection pressure), highly conserved with multiple hyper variable regions (conserved regions used as anchors for alignment and hyper variable regions mutate faster which indicate divergence of a species), sufficient length
why is the 16s rRNA gene called the 18s rRNA gene in eukaryotes?
because it is slightly larger and sediments slightly slower
what did Carl Woese figure out about the domains of life and how?
by looking at different sequences as molecular clocks, he kept finding that all life would be grouped into three groups rather than the previously believed five groups
these groups were eukaryotes, eubacteria (new bacteria) and archaebacteria (old bacteria or the base of the tree which everything evolved from)
how have the 16s observations been validated and what has changed?
observations based on 16s have been validated by other genes, by amino acid sequence and by enzyme structure
RNA polymerase structure however has now shown that archaeological are more closely linked with eukarya rather than bacteria suggesting they are not the old base of the tree but are instead completely different organisms
what is the eocyte hypothesis (two domain hypothesis)?
suggests that eukaryotes branch within the archaea meaning that they are just a sub-group of archaea and there are only two domains of life. it implies that the closest relative to eukaryotes is one or all of the TACK archaea
what evidence do we have for the two domain hypothesis?
TACK archaea and eukaryotes share genes no found in other archaea so our ancestors must be TACK archaea
what is a limitation to phylogenetics?
horizontal gene transfer
what are saprophytic fungi?
decomposers which get nutrition through absorptive nutrition (releasing enzymes which break down something and then they suck up all the nutrients). they convert organic material to fungal biomass, carbon dioxide and small molecules such as organic acids
what are mycorrhizal fungi?
mutualistic fungi which colonise plant roots and help solubilise phosphorus and bring soil nutrients to the plant in exchange for carbon from the plant
what are the two types of mycorrhizal fungi?
ectomycorrhizae - grow on the surface of roots and are commonly associated with trees
endomycorrhizae e.g. arbuscular mycorrhizal fungi - grow within the root cells and commonly associated with grasses, crops, vegetables and shrubs
what is the third group of fungi?
pathogens or parasites which cause reduced production or death when they colonise other organisms
what is the problem with species classification?
there is a long-standing failure of biologists to agree on how we should identify species and how we should define the word ‘species’
differing definitions include; taxonomic rank, a group of organisms capable of interbreeding and producing fertile offspring, and a separately evolving lineage that forms a single gene pool
what do we focus on currently to define a species?
a phenotypic assignment (organisms phenotype) and genome similarities (so kind of a hybrid of what it looks like/what it can do and the evolutionary history when comparing genomes)
what is the definition of a prokaryotic species?
a category that circumscribed a genetically coherent group of individual isolates/strains sharing a high degree of similarity in independent features, comparatively tests under highly standardised conditions
what is a bacterial species defined as?
a genomically coherent group of organisms
what is required for a strain to belong to a new species?
OrthoANI (average nucleotide identity) should be about 95% (meaning if they are divergent by 5% or more they are classified as a new species)
if the above classifies it as a new species and you want to describe it as such, you must carry out additional taxonomic research such as phenotypic characterisations and biochemical tests to show new phenotypes. This is because just cause the gene is there doesn’t mean its being used
it must also have a type strain that is live which anyone can obtain for taxonomic study. It must be a pure culture so it can be used as a benchmark when comparing new species
what is the issue with classification of new bacteria?
it is difficult to quantify differences and define a boundary for a species. This is cause its usually not clusters but a gradient of species making it difficult to decide where to draw the line as a new species
what is the practical definition of a prokaryotic species?
a group of strains that are characterised by a certain degree of phenotypic consistency, showing 70% of DNA-DNA binding and over 97% of 16S ribosomal RNA gene-sequence identity
what three things do all classification systems share?
they are arbitrary (no reason why we draw the line at that certain point, just cause it works with what we’ve done previously)
they are anthropocentric (based off classic microbiology which focused on classifying pathogens but doesn’t fit all microbial diversity)
they are rooted in practical necessity (we just make the best choices we can when it comes to classification cause we don’t know any better systems)
what have bacterial species historically been defined by?
growth characteristics (morphology, gram stain, growth medium) (classic species were generally fast growing pathogens)
disease caused
what is DNA-DNA hybridisation?
involves comparing the genome of two organisms to see how well they hybridise to each other. It is considered phenotypic as well as genotypic because if you compare the genome you should be able to predict what the phenotypes are actually going to be
essentially measures the degree of genetic similarity between complete genomes by measuring the amount of heat required to melt the hydrogen bonds between the base pairs that form the links between the two strands of the double helix of duplex DNA
outline the DNA-DNA hybridisation process?
put genomic DNA in tube and add dye, apply heat and eventually DNA melts and releases dye (so we are looking for when the dye disappears)
you apply a second genome to the same mix and melt it again so that it anneals to the other genome strand. You then get a hybrid genome with one strand from one organism and one from the other
any difference in temperature to melt this hybrid suggests there is a difference in similarity of those genomes
the DNA melting temperatures from both organisms can be graphed to show two peaks (one for one organism and one for the other)
what does DNA-DNA hybridisation provide a standardised means for?
identifying and classifying prokaryotes that lack well defined morphological or phenotypic characteristics
what are the pros of DNA-DNA hybridisation?
greater than 70% similarity means its the same species
good correspondences with phenotypically coherent clusters of strains in Enterobacteriaceae
what are the cons/issues of DNA-DNA hybridisation?
unclear how it relates to whole-genome relatedness (i.e. you are looking at melting not actually comparing DNA sequences)
time consuming
carried out properly by few laboratories
ill-suited for rapid identification
only suited for pair-wise comparison
previous classification must be present
unavailable for non-culturable organisms (and only 1-2% of prokaryotic cells are cultivable)
how do you sequence the 16S rRNA gene?
isolate DNA
heat to seperate strands and add specific primers
primer extension with DNA polymerase
repeat above steps to obtain many copies of 16S rRNA gene
run agarose gel and check for correct sized product
purify and sequence PCR product
what were the major impacts of DNA-DNA hybridisation?
established a base line for species delineation
allows us to affirm whether or not a species is the same after high-level classification with 16S (>70% binding value and >97% 16S rRNA gene sequence similarity = same species)
what is the issue with 16S rRNA sequencing?
<97% 16S rRNA sequence similarity always = different species
BUT >97% sequence similarity doesn’t always mean the same species and as a result may not be >70% binding value (think gorilla and humans sharing >97%)
so basically 16S good to confirm when things are different but not when they are the same
what are some limitations of using the 16S rRNA gene for gene comparison?
lacks resolution compared to DDH (16S gene small compared to whole genome)
can’t discriminate between highly related species (>97%)
doesn’t relate to metabolic capabilities or provide fine details as only relies on a single gene (16s gene highly conserved so doesn’t change except for over long periods of time but rest of genome under lots of different selection pressures thus can gain/lose functions without changing 16s gene so might not capture more recent evolutionary events)
what is ANI (average nucleotide identity) and how does it work?
used to figure out how different two organisms are
works by running a BLAST or alignment which compares base pair to base pair over the entire genome for two genomes. But its done through informatics on a computer so can compare the whole genome very quickly
it has replaced DNA-DNA hybridisation with a new threshold of 95% (as opposed to 70% with DDH) so there is a much tighter boundary for what a species is supposed to be
ANI won’t give you a new organisms taxonomy at phylum, family or genus level so for this you still have to rely on 16S sequencing
even though it has replaced DNA-DNA hybridisation, what are some limitations of ANI?
we are simply updating the way we measure/compare differences
no biological definition/explanation for why we use 95% ANI or 70% DDH
DDH was a biased system and ANI is just the same biased classification system only faster
what is AAI (average amino acid identity)?
AAI is the same principle as ANI but instead of comparing the whole sequence you are comparing the amino acid sequence across all proteins in that genome
what is the correlation between DDH, ANI and 16S rRNA?
there is good correlation between all three
more than 70% DDH is equivalent to more than 95% ANI is equivalent to more than 98.5% 16S rRNA (but all these are still based off old classification systems so even though there’s good correlation there is no theory to support them)
what is MLST (multilocus sequence typing)?
a method for the genotypic characterisation of prokaryotes at the infraspecific level, using the allelic mismatches of a small number (usually 7) of housekeeping genes. Housekeeping genes are essential for the organism and found in all the strains. Designed as a tool in molecular epidemiology (so usually used during epidemics) and used for recognising distinct strains within named species
precursor to MLSA
what is MLSA (multilocus sequence analysis)?
a method for the genotypic characterisation of a more diverse group of prokaryotes (including entire genera) using the sequences of multiple protein-coding genes
so basically the same as MLST but instead of using 7 housekeeping genes can use any genes which are in all the genomes that you are comparing
what does MLSA allow us to do?
different species can be clearly separated
ecotypes can be identified (ecotypes are populations that are genetically cohesive and ecologically distinct)
what are the pros and cons of MLSA?
pros: higher resolution, uses multiple genes, provides a species or lower classification as opposed to 16S rRNA gene analysis which provides a genus classification
cons: genes must all be a single copy, must all be present in all the organisms being analysed, still the problem of what really constitutes a species
what is metabolism?
the sum total of all chemical reactions that occur in a cell
composed of two opposing reactions:
- catabolic reactions (energy releasing metabolic reactions)
- anabolic reactions (energy requiring metabolic reactions)
what is the issue with current knowledge of microbial metabolism?
most of it is based on study of laboratory cultures
what nutrients do cells need and why?
cells need nutrients for a supply of monomers (or precursors of) which are required for growth. This includes macronutrients (CHONSP) which are nutrients required in large amounts and micronutrients which are required in trace amounts and many of which are cofactors or part of catalytic site of enzymes
despite needing larger volumes of macronutrients than micronutrients both are equally important for a cell
why are many micronutrients transition metals?
they have multiple states which allows them to mediate redox reactions
what is Gibbs free energy?
energy released that is available to do work
in a biological context this would be the energy released when you break a bond which is actually available for use by the cell
what is the electron donor in a redox reaction?
the substance being oxidised
what is the electron acceptor in a redox reaction?
the substance being reduced
if something gets reduced, does it have more or less electronegativity and does it have more electrons?
more electronegativity
more electrons
redox reactions occur in pairs by pairing opposite reactions, what is the energy from these oxidation-reduction reactions used for?
synthesis of energy-rich compounds e.g. ATP (i.e. anabolism)
what are the two ways in which we can use the energy from redox reactions?
stored in chemical compounds/bonds as ‘batteries’ for use at a later time e.g. ATP
used immediately as an energy source e.g. NADH goes straight into membrane, is converted to NAD and the electrons lost are used to physically drive a flagella or motor
where on the redox tower would you find electron rich (donors) compounds?
top
where on the redox tower would you find electron poor (acceptors) compounds?
bottom
what does a bigger drop between donor and acceptor on the redox tower mean?
more energy harvested
redox reaction usually involve reactions between intermediates. What are these intermediates called and what are the two classes they can be divided into?
electron carriers
divided into prosthetic groups (attached to enzymes) and coenzymes (diffusible) e.g. NAD+ and NADH (so these coenzymes facilitate redox reactions)
electron carriers don’t get broken down, what happens to them instead?
they get recycled/reused but they must be recharged in order to be used again
despite the different ways in which a cell can store energy, what do they all hinge on?
bonds
what are the three fueling reactions which are crucial to an organism?
energy source, carbon source, electron source
what does energy source in an organism look at?
how does it drive metabolism
what is carbon source looking at in an organism?
what things can it actually degrade
what is electron source looking at in organisms?
the physical carrier that is carrying that energy from the energy source
what does the breakdown of carbon/glucose lead to?
energy harvesting (used to generate energy)
carbon harvesting (intermediate compounds) (used to build things)
what is the energy source for chemotrophs?
chemicals
what is the energy source for chemoorganotrophs?
energy from C molecules
what is the energy source for chemolithotrophs?
energy from inorganic molecules
what is the energy source for phototrophs?
sunlight
what is the energy source for photoheterotrophs?
energy from sun and C from organic molecules
what is the energy source for photoautotrophs?
energy from sun used to fix C
why does aerobic respiration lead to the highest amount of energy?
because the transfer of electrons in very easy
what do good electron acceptors have either of?
an abundance of oxygen or a lack of hydrogen (oxidised)
what do good electron donors have?
an absence of oxygen and an abundance of hydrogen (reduced state)
what do organisms that can’t use organic compounds use as electron donors/acceptors?
inorganic compounds
do you need oxygen present to perform respiration?
no
its just that aerobic respiration is the preferred method for organisms cause more energy produced
what is anaerobic respiration dependent on?
electron transport, generation of a proton motive force and ATPase activity
what is used as electron acceptors in anaerobic respiration?
anything other than oxygen egg, nitrate, ferric iron, sulfate, carbonate and certain organic compounds
why is more energy produced under aerobic conditions than anaerobic (other than oxygen being at bottom of electron/redox tower)?
because when an organism realises there is no oxygen it will turn on certain genes that code a bunch of other proteins and fill the membrane in the electron transport chain with them
the process is the same but the organism is now using three or more electron acceptors to get it to the terminal electron acceptor
so it would get a much more efficient electron transport chain under aerobic conditions
combine this with oxygen being a better electron acceptor and you see why anaerobic conditions are preferred
what have been the main technological advancements in microbiology?
direct cell counts
(great plate count anomaly)
culturing
(the rare biosphere)
16S surveys
(biological “dark matter”)
metagenomics
what is the great plate count anomaly?
the finding that microscopic and culture enumerations differ by orders of magnitude for three possible reasons
- different nutritional requirements
- cells may be in non-dividing state
- organisms may rely on other organisms (can’t grow alone)
basically a lot of micro-organisms are unculturable
what is enrichment bias?
each culture media selects for only a few organisms
microorganisms cultured in the lab are frequently only minor components of the microbial ecosystem
what are the reasons for why we have a tendency towards common fast growing microbes when culturing?
nutrients available in the lab culture are typically much higher than in nature
narrow set of conditions in lab culture
selects for organisms that can grow alone (even though many organisms are auxotrophic)
why is a dilution of inoculum performed?
to eliminate rapidly growing but quantitatively insignificant weed species
what are two culture-independent analyses of microbial communities?
PCR methods of microbial community analysis
Environmental genomics and related methods
what is the rare biosphere?
a concept describing the observation that a very large proportion of the taxa in microbial communities are extremely uncommon i.e. there is massive diversity found in low abundances, so a few organisms make up the vast majority of sequences you get from a sample cause a few species make up the majority of organisms in any ecosystem
what led to the discovery of the rare biosphere?
new technologies such as 16S sequencing
this enabled the detection of these rare populations as prior techniques lacked the resolution to detect the rare biosphere
what is a genome?
entire complement of genetic information including genes, regulatory sequences and noncoding DNA
genomes are also somewhat moulded by an organisms lifestyle
what is genomics?
discipline of mapping, sequencing, analysing and comparing genomes
what is bioinformatics?
multidisciplinary field that combines biology, computer science, information engineering, mathematics and statistics to analyse and interpret biological data
explores gene functions, who carries what genes and where these genes are found
what is comparative genomics?
involves making alignments to compare genes and genomes across many different organisms so that we can identify trends such as highly conserved genes
many genes can be identified by sequence similarity to genes found in other organisms (comparative analysis)
comparative analyses allow for predictions of metabolic pathways and transport systems
what is the number of genes with a role that can be clearly identified in a given genome?
70% or less of total open reading frames (ORFs) detected
what are hypothetical proteins?
uncharacterised open reading frames (ORFs) so basically proteins that likely exist but whose function is not known
hypothetical proteins likely encode nonessential genes (e.g. in E. coli there are lots and they are thought to encode regulatory or redundant proteins)
considered to be biological ‘dark matter’ which is parts of the genome that are consistently being detected but we have no idea what their purpose is
what is metagenomics (environmental genomics)?
DNA from whole microbial community (in a sample) is extracted and directly sequenced
no primers needed so metagenomics is unbiased
why can the primers and polymerases used in PCR methods of microbial analysis create a bias?
they may preferentially amplify one target over another
to make primers you also need to know the sequence of the part of DNA you want to flag creating further bias against unknown organisms
what is the metagenome?
the total genetic content of all organisms present in an environment
what are the pros of metagenomics?
detect as many genes as possible
yields picture of gene pool in environment
can detect genes that are not amplified by current PCR primers
powerful tool for assessing the phylogenetic and metabolic diversity of an environment
how does total DNA extraction occur in metagenomics?
environmental single-gene surveys
shotgun studies of all environmental genes
what does the DNA sequencing in metagenomic allow us to identify?
identify common genes within a community
identify genome contents favoured by current environmental conditions
what is the difference between 16S and metagenomics?
16s targets a single gene while metagenomics sequences all the information in a sample
what is transcriptomics?
targets RNA (transcripts from genome)
the transcriptome is the entire complement of RNA produced under a given set of conditions and this is what transcriptomics is looking at
what is proteomics?
genome-wide study of the structure, function and regulation of an organisms proteins
equivalent of sequencing all the proteins in a sample to see what is actually made
what is metabolomics?
the complete set of metabolic intermediates and other small molecules produced in an organism
can be broken down into glycomics (sugars), lipodomics (lipids), and fluxomics (changes in concentration of things)
why might one use all of the omics together?
just because a gene is their doesn’t mean it is being used
if a gene is found by genomics is found to be turned on by transcriptomics, then we don’t know if the actual phenotype shows as there may be post-translational modifications. So we then use proteomics
what can be learned from RNA experiments?
expression of specific groups of genes under different conditions
expression of genes with unknown function; can yield clues to possible roles
comparison of gene content in closely related organisms
identification of specific organisms