Metagenomics Flashcards
What environmental processes are microbes responsible for?
o Most of the biogeochemical cycles on earth [Cycling of substances through which a substance moves through the biotic and abiotic]
o Waste processing
o Growth & reproduction of plants & animals
o Production of antibiotics, food fermentation & maintain human health.
What is Metagenomics?
- The study of genetic material recovered directly from environmental samples
- It involves pooling and studying the genomes of all the organisms in a community -> all the functions encoded in the community’s DNA (metagenome) can be studied
What does Metagenomics let us find?
o Genetic info on potentially novel biocatalysts / enzymes
o Genomic linkages between function & phylogeny for uncultured organisms
o Evolutionary profiles of community function & structure
What are the steps in a Typical Sequence-Based Metagenome Project?
- Experimental Design
- Sampling
- Sample fractionation
- DNA extraction
- DNA sequencing
- Assembly
- Annotation
- Statistical analysis
- Data storage
- Data sharing
What is the foundation of a good Metagenomics study?
Experimental Design
What criteria should extracted DNA satisfy?
o High quality
o Representative of all cells present in sample
o In sufficient amounts for library production & sequencing
What are 4 processing methods used in Metagenomics studies?
- Physical fractionation:
o Applicable only when certain parts of community are the target of analysis (like viruses in seawater) - Physical separation & isolation of cells from samples:
o Might be necessary to maximise DNA yield or avoid co-extraction of enzymatic inhibitors (Like humid acid in soil - stick to exposed DNA) - Lysis of cells:
o Direct lysis in soil has quantifiable bias vs indirect lysis in terms of: Microbial diversity, DNA yield, Resulting sequence fragment length - Multiple Displacement Amplification (MDA):
What is the process of Multiple Displacement Amplification ?
- Non-PCR based DNA amp technique.
- Anneals random hexamer primers to template
o No denaturation required, increase in [Hexamers] is sufficient to allow slow initial priming step - Once reaction starts strand-displacing mechanism of MDA releases ssTemplate for ongoing priming & amp
- phi29 polymerase extends primers till they reach the next primer (start of a dsDNA section)
- ph29 displaces the dsDNA strand it just hit and continues polymerization (‘under’ the displaced strand)
- New primers bind to displaced strand -> polymerization again -> hyperbranched structure
- MDA generates larger sized product with lower error frequency than conventional PCR amplification
What are 3 sequencing methods used in Metagenomics?
- Classical Sanger Sequencing
- 454/Roche System / Pyrosequencing
- Illumina
Why is Sanger still considered the gold standard sequencing technology?
o Low error rate
o Large insert sizes
o Long read length (>700bp)
When is Sanger sequencing applicable?
- Applicable if objective is generating close-to-complete genomes in low-diversity environs
What are the disadvantages of Sanger sequencing?
o Labor-intensive
o Bias against genes toxic to host
- [because of large insert size, full length genes could be included which would be expressed and kill host]
o Overall cost per Gb (±400 000 USD)
Roche system summary info
- Based ‘sequencing by synthesis’ principle
- Relies on detection of pyrophosphate release on nucleotide incorporation
o Sanger relies on chain termination with diDN - Uses emulsion polymerase chain reaction (ePCR) to clonally amplify random DNA fragments attached to microscopic beads
- Much cheaper than Sanger (± 20 000 USD per Gbp)
- Avg read length = 600-800bp
- Offers multiplexing (up to 12 samples of ±500 Mbp in a single run
What are the steps of Pyrosequencing via the 454/Roche System?
- DNA Library constructed -> DNA Fragments ligated with adaptors
- Strand amplification by ePCR on surfaces of 100 000’s of agarose beads
- Surfaces of beads have mills of oligomers -> each is complimentary to adaptors on fragments
- ePCR uses vigorously mixed oil & aqueous mixture -> isolate individual agarose beads (each bead with individual unique DNA fragment hybridized to its surface
a. Isolated in aqueous micelles that also contain the PCR reactants - Micelles pipetted into wells of microtiter plate -> temp cycling produces > 1mil sequence-ready beads
- Each bead has up to 1mil copies of original annealed fragment
- Beads added to surface of 454 pico titer plate (PTP)
a. PTP: Single wells in tips of fused fiber optic strands (1 bead in each well) - Smaller magnetic & latex beads (attached to active enzymes needed for pyrosequencing) added to surround DNA-containing agarose beads in PTP
- PTP placed in sequencer, nucleotide & reagent solutions delivered into it in sequential fashion
- Binding of nucleotide releases APS -> ATP sulfurylase + APS converts PPi to ATP ->ATP + luciferase -> oxidation of luciferin -> light
What is the Illumina sequencing average read length?
±150bp
What is the cost of illumina?
±50 USD per Gbp
What are the drawbacks of Illumina?
- Limited read length-> increased proportion of assembled reads which may be too short for functional annotation
- Limited systematic errors - But some datasets have high error rates at tail ends of reads
o Can clip reads to eliminate the ‘bad’ datasets
Why is Assembly necessary?
- Assembly of short read fragments is necessary to obtain longer genomic contigs to:
o Determine genome sequence of uncultured organisms
o Obtain full-length CDS (coding DNA sequence) for subsequent characterization
What is a Pangenome?
o Entire gene set of all strains of a species. Includes:
o Core genome (genes present in all strains)
o Variable genome (genes present in only some strains)
Why are assembly algorithms that assume clonal genomes less suitable for Metagenomics?
- Microbe comms have significant variation at strain & species level
o Because the ‘clonal’ assumptions built into many assemblers might lead to suppression of contig formation for some heterogenous taxa at specific parameter settings
o De Bruijn-type assemblers deal explicitly with non-clonality of natural populations
What are the 2 Assembly strategies for Metagenomics samples?
o Reference-based assembly (co-assembly):
- Works well if closely related reference genomes are available - BUT: differences between sample genome & reference (large insertion, deletion etc.) can -> fragmented assembly or in divergent regions not being covered.
o De novo assembly:
- Typically requires larger computation resources
What is Binning?
The process of sorting DNA sequences into groups that might represent an individual genome or genomes from closely related organisms