metabarcoding Flashcards
what is DNA barcoding
DNA barcoding = Molecular taxonomic identification using short DNA sections (barcode) from part of a genome - allows species identification
- sampling -> DNA extraction -> PCR -> Sanger sequencing -> database alignment / bioinformatics
why do we extract DNA and what are the challenges
- Optimise method based on sample type maximising DNA concentration & quality
Challenges:
>Physically and/or chemically difficult
>Extraction biases between different organisms
>DNA degradation
>Low DNA concentrations
>Contamination
how can we apply DNA barcoding
- detect rare+ invasive species
- response to pollution, warming, acidification
- biodiversity monitoring
- ecosystem change over time
- diet analysis
difference between barcoding and metabarcoding
- barcoding = Targeted sample identification (one species at a time). -tend to use Sanger sequencing
- metabarcoding = Profiling complex communities (multiple different species at same time) - tend to use high-throughput sequencing e.g. NGS, 3rd genetic sequencing
what’s important when Selecting good genomic regions for metabarcoding during PCR
- good databases for identification
- sufficient variability between species/strains
- short length (~150-300 bp); good for NGS & DNA quality
- conserved flanking sites for universal primers
- Primer selection depends on the question
- Common barcode regions include = COI in mitochondrial genome (vertebrates), 16S (prokaryotes), ITS (fungi), RuBisCo (plants)
what are the common barcode regions for vertebrates, prokaryotes, fungi and plants
- COI in mitochondrial genome (vertebrates)
- 16S (prokaryotes)
- ITS (fungi)
- RuBisCo (plants)
High-throughput sequencing methods
- 1st gen sequencing (sanger sequencing) = 1 thing at a time
- NGS = multiple at same time
>Millions of short reads - 3rd gen sequencing (oxford nanopore, PacBio SMRTcell)
>Ultra-long reads
>More accurate
what happens during PCR before sequencing
PCR amplicons from each individual sample are tagged using unique index adaptor - allows 100’s of samples to be multiplexed and sequenced simultaneous
what’s included in the bioinformatics part of analysis
- filtering of sequence reads - cluster ‘good’ reads into taxonomic units i.e. OTUs (97%) or ASVs (100%) to represent distinct taxa
- Align reads to reference databases & annotate e.g. BLAST - helps identify the taxa represented in each cluster
- Quantify reads for each taxonomic unit - estimate the abundance of each identified taxonomic unit in the sample
- Can then look at community diversity / structure
positives for MtDNA as a marker
- single copy - easy to sequence - Sanger Sequencing
- recovery from very small or degraded biological
samples = higher than nDNA - mtDNA molecules exist in thousands of copies per cell - COI used for barcoding other useful genes Cyt (also barcoding), 16S, Control Region (or D-Loop)
- Conserved arrangement of genes - easier to design universal primers
- mtDNA has higher rates of mutation - more likely to exhibit sequence variation
- mtDNA is maternally inherited - useful for studying sex-biased population processes and hybridisation between differing populations
- no recombination - easier to trace genetic lineages and so reconstruct historical processes
- mtDNA exhibits a small effective population size (¼ of nuclear) more susceptible to genetic drift and readily displays demographic changes in populations (such as past bottlenecks)
negatives for MtDNA as a marker
- mtDNA acts effectively as a single locus and cannot compare genealogies of multiple independent loci
- small Ne may exaggerate historical events and underestimate genetic diversity
- Only provides a genetic picture of female-linked population processes in the evolution of the species
- the selective neutrality of mitochondrial genes has been called into question could also be involved in speciation
- mtDNA sequences have been found in nuclear DNA, known as pseudo- mtDNA or nu-mtDNA (so violating most of the positives)