Metagenomics Flashcards
why are internal standards important when comparing metagenomic samples across spatial and temporal scales?
Because they allow for absolute quantification of organisms/transcripts in a sample (genes/L or avg transcripts/cell)
If an internal standard isn’t used, how do most studies collect meta-omics data?
In a relative framework, in which abundance of genes is calculated as percent of the sequence library
How are internal standard reads recovered and quantified after sequencing? (2 steps)
- By using a BLASTn homology search for the template sequence against the reference genome sequence for the internal standard 2. Then take the initial BLAST hits and use a BLASTx search against the RefSeq database to identify all the protein encoding reads
Why is a second BLAST step needed to recover and identify reads from an internal standard after sequencing?
to account for false positives in the BLASTn homology search
What amount of internal standard DNA should be added to a metagenomic sample?
Enough to quantify but not so high as to dominate the reads ~0.5% of expected yield (should yield 0.1-5% of total reads)
What type of DNA should be used as an internal standard?
DNA from a sequenced and cultured microbe that is not present in the environment. Example: a hydrothermal vent organism such as Thermus thermophilus
What paper are the equations for metagenome normalization using internal standards found in?
Stainsky et al. 2013 Chapter 12 in Methods in Enzymology Vol. 531 “Use of Internal Standards for Quantitative Metatranscriptome and Metagenome Analysis”
How many hypervariable regions are on the SSU ribosomal gene?
9
Which hypervariable regions are best for distinguishing pathogenic bacteria?
V2, V3, and V6.
V1 better for gram-pos, and V4-9 were are less discriminatory (Chakravorty et al. 2008, looked at 110 blood borne pathogens)
When creating a 16S library, what is the next step after PCR amplification?
Adding Illumina adapters
How many reads are generated by a MiSeq Illumina run?
~25 million reads/flowcell
What is the max read size using MiSeq?
300 bp (can be paired or single end)
Assuming 96 indexed samples, how many reads can you get per sample on a MiSeq?
>100,000
How many reads can be generated by a HiSeq 4000?
2.5 billion per flow cell (312.5 million bp per lane)
What is the max read size for a HiSeq 4000?
150 (can be paired or single)