The Metagenome Flashcards
What is metagenomics?
This is the study of genetic material recovered directly from environmental or biological systems/compartments
What is the significance of metagenomics?
- Unbiased view of taxanomic diversity in a sample
- Not limited by ability to culture
- Overall view of gene content in a sample
How many base pairs are sequenced using metagenomics?
1.045 billion base pairs sequenced
elucidate the gene content, diversity, and relative abundance of the organisms
What are some of the significant findings of metagenomics
estimated to derive from at least 1800 genomic species
identified 148 previously unknown bacterial phylotypes
identified over 1.2 million previously unknown genes
What is microbiota?
ecological community of commensal and pathogenic microorganisms
Includes bacteria archaea, protists, fungi and viruses
What is the microbiome?
The collective genomes of microorganisms in these communities
Name some environemntal microbiota
Deep sea microbiome
Soil microbiome
Hospital microbiome
Subway microbiome
What are some examples of human microbiota?
Gut microbiome
Skin microbiome
Oral microbiome
Vaginal microbiome
How diverse are the human microbiota?
Taxonomic diversity varies by body site
How does microbiome differ from person to person?
Microbiome unique to each individual, even between twins
What does a change in microbiome suggest?
Changes in the microbiome have been associated with multiple human illnesses, e.g. Irritable Bowel Syndrome, depression, cancer
What is the significance of gut microbiome?
Gut microbiome can classify individuals as lean or obese with >90% accuracy
Early-life gut microbiomes linked to development of allergic conditions e.g. asthma
What is CDI?
Clostridium difficile infection (CDI) is a disease of the large intestine caused by toxins produced by the spore forming bacterium Clostridium difficile
How does stool microbiota differ in CDI?
Stool microbiome during Clostridium difficile infection (CDI), quite different from healthy stool microbiome
CDI has greater effect on stool microbiome than host genetic factors
How CDI cured?
Faecal microbiota transplant is able to cure CDI
Restoration of the stool microbiome to that of healthy state is rapid following transplantation
What is the relevant factor of microbiota in causing disease?
The relative abundance of organisms may not be as important gene content
What are the technological approaches to metagenomics?
- Targeted PCR amplification
- Whole Genome Shotgun Sequencing
Describe the PCR technique used in metagenomics
16S rRNA, bacteria
Internal transcribed spacer (ITS), 18S rRNA, eukaryotes
Outline the benefits of Whole Genome shotgun sequencing in metagenomics
Assess taxonomic diversity in sample
Assess composite gene functions in sample
Unbiased, all micro-organisms
What is the 16s ribosomal RNA?
16S ribosomal RNA is component of 30S small subunit of prokaryotic ribosome
Outline the 16s Targeted PCR amplification workflow
- Sample selection
- DNA extraction
- 16s PCR amplification
- Sequencing
- Analysis
Name some of the 16s databases
Greengenes
Silva
RDP
Give examples of open source analysis pipelines available
QIIME
Mothur
DADA2
What are the 4 main metagenomic technologies available
Roche454
Illumina HiSeq
IonTorrent PGM
Illumina MiSeq
Which regions are commonly used in 16s Targeted PCR amplification?
Different studies differ
V1 - V2 and V1 - V3 are very common
but there is no consensus for which region to use
How does gut microbiota change with age?
<4 yrs old have lots of actinobacteria (generally bifidobacterium well known baby gut bacteria)
As you get older bifidobacterium goes down, bacteroides comes up
What are the 2 factors taken into consideration when choosing which variable region to sequence?
- Phylogenetic signal
- amplicon length
Describe how phylogenetic signals in bacteria determine which variable region to choose?
Similarities cluster together, differences cluster further apart
e.g.
V1-V2 reds (Staphylococcus epidermidis) and blue (staphylococcus aureus) are separated quite clearly
V4 tree: reds and blues are very mixed
to study skin biome use V1-V2
How does amplicon length affect the variable region chosen?
Errors present in sequences (by random). The error in the middle will be corrected due to overlap
Errors not overlapped will remain in sample ∴ data won’t be as good
where there is 100% overlap the whole sequence is corrected ∴ provides a much more accurate estimation of genes present
Where inthe genome 16s rRNA found?
16S rRNA gene found in all bacteria
How susceptible to contamination is the 16s rRNA gene?
Method very sensitive to contamination from
- Environment
- Operator
- Reagents
Especially important for low biomass samples
Why are low biomass samples more susceptible to contamination?
E.g. fecal sample has lots of bacteria material ∴ contamination will be at a smaller proportion - less effect
Skin swabs have a small bacteria presence so contamination will have a greater effect on results
How can we mitigate potential contamination?
- Randomise samples
- Note batch numbers of reagents
- Sequence negative controls
Why s choosing the correct variable region important?
Choice of variable region determines resolution
Less reliable below genus level
What are the new full length 16s technologies available?
New long read technology enable full length 16S sequencing
PacBio
Nanopore
What is the drawback of full length 16s sequencing?
However, higher error rates of long read technologies introduce noise
Development ongoing
How does WGS differ from 16s sequencing?
In 16s rRNA amplification only sequence one gene
WGS sequences across all genomes at the same time - don’t always align but are made up of different bacterial species
How does WGS sequencing allow us to see genes?
WGS Shotgun is assembled
After assembly we can form our phylogenetic tree to find relative abundances of bacteria and genes present.
What other analysis does WGS allow?
Gene prediction can also occur as we have sequenced all genomes - can look at pathways and which genes contribute to which pathway in certain individuals genes in pathways may be accelerated etc.
Why is WGS still not 100% reliable?
Host cells often in excess in the sample as amplification not done
No amplification step to enrich for bacterial DNA
need to ensure we’re not just sequencing host cell
Which samples are more likely to be contaminated by hosts using WGS?
Faecal: <10% human reads
Saliva, nasal, skin samples: >90% human reads
What steps are taken to enrich the sample without amplification in WGS?
Pre-extraction
Post-extraction
Describe pre-extraction?
Differential lysis of mammalian cells
Enrichs for intact microbial cells
Potential bias towards gram-positive bacteria
What is post-extraction?
Enzymatic degradation of methylated nucleotides targets mammalian DNA
Bias against AT rich bacterial genomes
What open source analysis pipelines are available for WGS?
MG-RAST
Metaphlan
Megan