Lecture 12 - Microbial identification and genomics Flashcards
Who were the first people to categorize life?
Aristotle categorized life into two fundamental
groups, Animals and Plants
In 1868 Ernst Haeckel proposed a third group,
Protista, to classify all microscopic life-forms
* later, Protista was subdivided into eukaryotic
microorganisms and bacteria
All identification was based on physiological
differences up until here
Classic taxonomy refers to what type of differences?
Classification based on physiological
differences
Classic Taxonomy: What are physical differences?
- cell shape
- structure of cell envelope (Gram stain, etc.)
- flagella / motility
- endospore formation
Classic Taxonomy: What are metabolic differences?
- Metabolic differences
- ability to metabolize various metabolites
such as carbohydrates, amino acids, lipids
Name 3 examples of Classic taxonomy
- Glucose catabolism
- Blood again and hemolysis
- Phenotype microarray
What is classification based on in Modern molecular taxonomy?
Classification based on direct comparison of gene sequences
* not all genes are suitable for taxonomy, though
2 characteristics of an ideal gene for molecular taxonomy
- gene is present in all organisms
- gene’s DNA sequence is very well conserved across all organisms
What happens if a gene is missing in certain organisms?
- If a gene is missing in certain organisms, that gene can not be used to construct a full phylogenetic tree for all species
- can’t use a heterocyst-specific gene to elucidate evolutionary relationship between a cyanobacteria
and a respiratory pathogen
What happens if a gene contains too many mutation?
If a gene contains too many mutations, that gene can may not be useful to construct accurate phylogeny as the information contains too much noise
What did Carl Woese first use for molecular taxonomy?
Carl Woese first used the small subunit
ribosomal rRNAs (SSU rRNA) for molecular
taxonomy
What is SSU rRNA?
SSU rRNA = small subunit
ribosomal rRNAs
- major component of small ribosomal subunit
- critical function for all forms of life
- 16S rRNA for bacteria
- 18S rRNA for eukaryotes
- coded by the ‘rDNA’ genes
Because of its crucial function, SSU rRNA is:
- universally present in all cellular organisms
- very well conserved between organisms
SSU rRNA have two types of regions:
- conserved regions are extremely well
conserved between different organisms - variable regions show more differences
Which region in SSU rRNA is used for taxonomic analysis and why?
Variable regions is used for taxonomic
analysis
* has enough difference to classify organisms at the Genus and species level
Conserved regions are too similar for
taxonomy
* used to design universal primers which can anneal and to amplify SSU rDNA genes from many organisms within the same domain of life
What creates the Tree of Life and for whom?
SSU rRNA can create a comprehensive ‘Tree of Life’ for
organisms with ribosomes
Do viruses have any universal genes which can be used for comparison?
Viruses do not have any universal genes which can be used for comparison
When is viral classification done?
Viral classification is frequently done using genes commonly found within the same Baltimore Class
* reverse transcriptase for retroviruses
* RdRp for RNA-viruses
* capsid proteins within the same class, etc.
In the Tree of life, where is CpV BQ1 and CpV BQ2?
Phylogeny of CpV BQ1 and BQ2 using the DNA polymerase (polB) gene of mimiviruses and relatives
- BQ2 is a closer relative to Mimiviridae whereas BQ1 belongs to a related family, Phycodnaviridae
What is Genomics?
determination and study of
complete genome sequences
Multiple uses of genomics? (Hint: link, compare, generate)
- link genetic characteristics of individual microbes with their physiological properties and ecological roles
- compare genome sequences between related species and strains of organisms to uncover basis of pathogenicity etc.
- generate hypothesis from the genome sequence and then confirm it experimentally
What bacteria was mistaken as the causative agent of influenza?
Haemophilus influenzae
- first bacteria for its genome to be completely read
- Gram-negative coccobacillus
- opportunistic pathogen in human respiratory tract
- was mistaken as the causative agent of
‘influenza’ during early days of microbiology - still inherits its name from the disease as an historical artefact
What does TIGR stand for?
The Institute for Genomic Research
What was Haemophilus influenzae read using?
Sanger and Shotgun sequencing
Explain Sanger sequencing
- Sanger sequencing: the name of the DNA
sequencing method which uses dideoxy-NTPs to prematurely stop a DNA polymerization reaction
Explain Shotgun sequencing
- Shotgun sequencing: a genome-sequencing approach which reads the genome in small, fragmented pieces and then re-assembles the pieces afterwards
What are the 3 steps of Shotgun sequencing using Sanger?
- Generate DNA Library from a genome
* fragment the genome by sonication etc.
and clone fragment into plasmids
* entire genome is represented in multiple
fragments cloned in individual plasmids - Use Sanger sequencing to sequence
the individual fragments in the library
* numerous short sequences (~600 bp long) are generated, each representing a tiniy portion of the genome - Assemble the short sequences into
one piece based on overlapping
regions
Explain high throughput sequencing
High throughput sequencing allows a mixture of multiple different fragments to be read in a single reaction, simultaneously
* Illumina
* Ion-torrent, etc.
The methods above typically produces a huge number of accurate, short reads (200 – 700 bases)
* requires huge computing power to assemble these small fragments
Made genomics extremely affordable
What is a limitation of Sanger?
Sanger sequencing is limited to reading one sample of DNA (one plasmid, etc.) in a single reaction
Explain Nanopore sequencing
Sequencing by ‘detecting shape’ of the
nucleic acid bases
A tiny channel (nanopore) is set up in a lipid bilayer with an electrical circuit connected
* ssDNA is passed through the channel
* each DNA base causes different electrical
fluctuations while passing through the channel
* these different electrical fluctuations are used to determine the sequence of ssDNA
Can read very long piece of DNA at once,
but is not as accurate
* 10000 – 100000+ bases
What was originally used as the Nanopore?
A Staphylococcus aureus toxin was originally used as the ‘nanopore’
what is alpha- hemolysin? Name the diameter too.
- ɑ-hemolysin (ɑ-HL)
- pore-forming protein which inserts itself into host’s lipid membrane to disrupt the permeability barrier
- minimum diameter is ~1.4 nm
- allows ssDNA or ssRNA to pass through, but not dsDNA
(dsDNA has about 2 nm diameter)
What are the 3 steps to make the nano pore?
- express a purified solution of ɑ-HL
- insert ɑ-HL into a lipid bilayer
- connect each side of the lipid bilayer to an anode and cathode to produce electric field
What does bioinformatics and annotation of genes refer to?
Genomic sequencing produces a lot of sequences which needs to be analyzed
Explain the Identification of potential protein CDS in the genome
- translate the entire genomic sequence
- look for instances where a sequence gets translated into a long polypeptide (50 - 100+ amino acids) without getting interrupted by a stop codon
How do you Assign putative function to potential protein CDS?
- BLAST search the sequence against existing proteins in the database
- many putative CDS will have no assignable function
What is comparative genomics?
studies relationships between different species
* insight into phylogeny of all life and diversity
* also used to investigate differences at a ‘smaller’ scale, comparing pathogens to their non-pathogenic
relatives, etc.
What is a Pan genome?
is the set of genes found in all related strains (variants) of a specific
organism
Is there a difference in genome size even within the same species of an organism?
Yes it is huge!
* For example, there can be over 30 percent difference in genome size between different strains of E.
coli
* All of these strains still have the ‘E. coli pan-genome’
* The remaining differences in these strains represent the different physiology of each E. coli strain
What is a very virulent strain of E. coli that is a major cause of food poisoning? Tell me about the size
E.Coli O-157:H7
O-157 genome is about 15 % larger
compared to K12
* large segments of genome existed in one strain but not the other
* two strains has common ancestor about 4.5 million years ago
* 4.1 Mb of DNA contain genes which are similar between these strains
What is a non-pathogenic lab-strain of E.Coli?
E. coli K12
What are O -islands?
- unique segments of DNA only found in O-157
- 177 O-islands in total, 1.34 Mb DNA
What are K-islands?
- unique segments of DNA only found in K12
- 234 K-islands, 0.53 Mb DNA
Name the 5 virulence associated genes found in O islands
- intimin (adhesion to intestine)
- type III secretion system (to secrete toxin?)
- iron uptake
- toxins, including the Shiga toxin
- antibiotic resistance
What can comparative genomics predict? and what can it be used to generate?
- Comparative genomics can predict these
potential virulence-associated factors before those genes were investigated in a wet lab - Comparative genomics can be used to
generate hypothesis to drive experiments
What is responsible for new genetic capability arising in a genome?
Homologs
What homologs?
genes which share common ancestor
Explain how some homologs arose from gene duplication events
Some homologs have arisen from gene
duplication events within the same genome
* two copies of the gene exist in the genome afterduplication
* one of these copies are free to evolve into a new function
What are paralogs?
Homologs which arise from duplication
events
- paralogs within the same genome often would have evolved to perform related by different functions
- for example, an organism may have various different ABC-transporters to uptake different nutrients
- these ABC-transporters are evolutionary related
variants of each other, sharing a common ancestor
What are orthologs?
homologs found in different
organisms which perform the same function
Why is metagenomics needed?
Cultivation of the organism is mandatory in classic molecular biology
* however, many organisms are not cultivatable
What is metagenomics?
DNA is extracted directly from microbial communities and analyzed as a mixture
* Use of next-generation sequencing allows scientists to analyze a mixed-sample of microbes without
isolation
Produce data on uncultivatable organisms in various environment
* marine
* soil
* human gut
Investigation of diversity using various genes such as SSU rRNA