The metagenome Flashcards
What is Metagenomics
the study of genetic material recovered directly from environmental or biological systems/compartments
- gives unbiased view of taxonomic diversity in a sample
What is Microbiota
the combination of organisms in the compartment
ecological community of commensal and pathogenic microorganisms including bacteria, archae, protists, fungi and viruses
What is Microbiome
microbiota + “theatre of activity” (which is the other biological components such as peptides, lipids, toxins etc)
the collective genomes of microorganisms in communities
Taxonomic Diversity
The number and relative abundance of species in a community.
Taxonomic diversity varies by body site
What have changes in human microbiome been associated with?
multiple human illnesses e.g.:
- IBS
- depression
- cancer
Which microbiome can be used to classify individuals as lean or obese?
Gut microbiome (>90% accuracy)
Which infection affects stool microbiome?
Clostridium difficile infection (CDI)
- stool microbiome infected with this is quite different from a healthy stool microbiome
- CDI has greater effect on stool microbiome than host genetic factors
Restoration of healthy stool microbiome after CDI
Faecal microbiota transplant
- rapid restoration to healthy state
What are the technological approaches carried out in Metagenomics?
1) Targeted PCR Amplification
> relies on 16S ribosomal RNA for bacteria
2) Whole Genome Shotgun Sequencing
16S Targeted PCR Amplification
16S ribosomal RNA is a component of the 30S small subunit of prokaryotic ribosome
The gene for a 16SrRNA is split into variable regions and has conserved regions
- Variable regions are the parts that can be used to determine the different species that are present in a sample - contain the phylogenetic signals
Steps of 16S Targeted PCR Amplification
1) Collect bacterial sample
2) Extract DNA from that sample
3) Perform a 16S PCR amplification
4) Put that on a sequencing machine
5) Sequences generated which match to the products we amplified which should match to the actual bacterial content of the original sample
Data Analysis of 16S Targeted PCR Amplification
Sequences generated are compared to databases of 16S genes from different bacterial species to identify the species and abundance of species in the sample
This abundance type measurement is converted into a graph, with colours representing different bacterial species
What is used to generate data from 16S Targeted PCR Amplification?
Software tools e.g.:
- QIIME2
- Mothur
- DADA2
Problem with sequencing 16S gene
It is 1500 bases long and therefore you can’t sequence the whole gene on “short read sequencing machine”.
Therefore, only particular regions are sequenced e.g. V1-V2, V1-V3, V3-V5, V4
However, new long read platform sequencing machines are able to sequence the whole 16S gene (V1-V9)
How do we choose which variable regions to sequence in 16S Targeted PCR Amplification?
Choosing variable regions is based on: depends on your experiment
- phylogenetic signal
- amplicon length
Controls in 16S Targeted PCR Amplification
16S rRNA gene is found in all bacteria which makes it very sensitive to contamination by:
- operator
- environment
- reagents
*especially important for low biomass samples
Kitome - how do we avoid contamination?
Avoids contamination by:
- randomise samples to avoid bias
- note batch numbers of reagents
- sequence negative controls (e.g. sequencing water could produce bacterial samples)
What determines resolution at which you can identify the bacteria?
the choice of the variable region
*will not get good resolution below the genus level hence less reliable
What determines resolution at which you can identify the bacteria?
the choice of the variable region
*will not get good resolution below the genus level hence less reliable
What enables full length 16S sequencing?
New long read technology enables full length 16S sequencing through:
- PacBio
- Nanopore
Disadvantage of long read technology
higher error rates of long read technology can introduce noise
Whole Genome Shotgun Sequencing
Same concept as 16S PCR amplification as you take a sample collection with mixture of bacteria and extract DNA.
However, the whole genome is sequenced as opposed to the 16S gene and build an assembly.
Data Analysis of Whole Genome Shotgun Sequencing
Because the entire genome is sequenced, many sequence reads are generated from the many genes sequenced, and these can be assembled to see how they all join up:
- phylogenetic tree constructed, taxonomic diversity analysed and relative bacterial abundance measured
- can carry out gene predictions and identify bacterial metabolic pathways which may be present in the context of conditions and diseases (not possible in 16S)
Disadvantage of whole genome shotgun sequencing
no amplification step like in 16S PCR amplification, meaning:
- patient host cells are often in excess as no amplification step to enrich for bacterial DNA
= end up sequencing the whole patient DNA
Sample dependent, typical yields of contaminating human reads:
> Faecal: <10% human reads
> Saliva, nasal, skin: >90% human reads
How to enrich bacterial sample so that there is less patient DNA without amplification?
Through pre-extraction or post-extraction:
PRE-EXTRACTION
- differential lysis (break down) of mammalian cells
- enriches for intact microbial cells
- however, introduces potential bias towards gram-positive bacteria
POST-EXTRACTION
- enzymatic degradation of methylated nucleotides targets mammalian DNA
- bias against AT rich bacterial genomes
16S PCR amplification vs WGSS
Both:
-asses taxonomic diversity in a sample
However, 16S Targeted PCR amplification is biased (only bacteria), whereas WGSS is unbiased (all microorganisms)
differences between targeted 16S PCR amplification and whole genome shotgun sequencing
16S PCR = access taxonomic diversity in sample BUT is biased bc only works on bacteria
whole gnome = also accesses taxonomic diversity, AND is unbiased, can work on all microorganisms. and it can access composite gene functions in the sample
they both have pretty much same steps