P4: ASV Inference Flashcards
generally explain the 16S rRNA workflow
sample collection –> DNA extraction –> library preparation –> sequencing
16S rRNA workflow - DNA extraction
- extract nucleic acid
- can be done with RNA as well
16S rRNA workflow: DNA extraction - why 16S
- 16S is a ribosomal subunit that combines with proteins and its present in both mitochondria and chloroplasts
- this makes it the best marker for analyzing DNA
16S rRNA workflow - library prep
- PCR 1 and 2 (with cleanup 1 and 2 stages between them)
- amplifying targets via primers
- result is the final library
16S rRNA workflow: library prep - PCR1
- region specific
- amplifies specific hypervariable regions
- has required primer overhangs
- goes on to clean up 1
16S rRNA workflow: library prep - PCR2
- indexing
- 2nd amplification
- adds barcodes/indexes (to identify the specific sequence)
- adds sequence adaptors
- needs a different primer than PCR1 and will go on to clean up 2
16S rRNA workflow: library prep - final library
- has the adaptor proteins necessary for sequencing
- from left to right: priming site for sequence reaction, library index, and flowcell handle
- will go on to sequencing
16S rRNA workflow - how is clean up 1 and 2 done
based on magnetic beads
how are sequencing results shown
- Fastq files
- they are a text-based format that contains the nucleotide sequence and its corresponding quality scores
- every 4 line represents 1 specific sequence
Fastq files - how to read them
- contains 4 levels of information
1. header
2. sequence results
3. base and Q
4. Q scores
Fastq files - header
- starts with “@” symbol
- has the barcode provided by sequencing authority
Fastq files - Base and Q
tells you what strand the gene was sequenced on (leading vs lagging)
Fastq files - Q scores
- shown as ASCII characters
- shows how reliable every sequenced nucleotide is
- numbered through 0-40
- 40: reliable
- </= 20: unreliable
- should have a ton of errors and quality drops in the beginning of a sequence
what are other (non-sequencing) 16S rRNA pipelines
- amplicon sequencing variants
- operational taxonomic units
- PhyloChips
other 16S rRNA pipelines - ASV
- distinguishing rogue amplicons by reducing noise (denoising) made by sequencing errors and keeping the reliable ones
- more resolution than OTUs
- intraspecific
other 16S rRNA pipelines - OTU
- clustering based on similarity
- shows general variation of taxonomy
- table done is based on a representative set
other 16S rRNA pipelines - PhyloChips
- non-PCR
- hybridize DNA after extracting and putting it in a chip
- the chip will then calculate every grouping within a sample
- specific to 1 type of microbiome but is good for environments that are well known
- novel (new) organisms cannot be detected
ASV inference using DADA^2
- filter and trim
- dereplicate
- learn error rates
- infer sample composition: denoising
- merge F/R reads
- construct sequence table
- remove chimera
- assign taxonomy
- export table
ASV inference using DADA^2 - filter and trim
- gets the environment ready
- reads files and keeps quality > 20
ASV inference using DADA^2 - dereplicate
- the reduction of a set of sequences that are identical
- creates a table
ASV inference using DADA^2 - learn error rates
- sequences that are most abundant have higher prevalence to be mutated
- these sequences may have subsequences that have more errors
ASV inference using DADA^2 - infer sample composition (denoising)
- computational method for removing sequence errors from amplicon reads
- or identifying the correct biological sequences in the reads
ASV inference using DADA^2 - merge F/R reads
- might lose some reads in this step
- possible reason: one of the reads may not have passed the quality/error score
ASV inference using DADA^2 - remove chimeras
- chimeras are sequences that comes from 2 different organisms/species
- sometimes polymerase cannot extend and will leave a sequence incomplete and the sequences will merge together
- this will make the next step in PCR amplify the merged sequence