14-16: Bioinformatics Flashcards
List all the steps of nucleic acid extraction
- Collect high-quality samples
- Cell lysis
- DNA extraction
- DNA/RNA extraction
- DNA/RNA quality assessment
Collect High Quality Samples
Good DNA/RNA comes from good samples, free of contamination.
Samples should be appropriately preserved prior to extraction.
a. Best practice is to snap-freeze samples in liquid nitrogen.
b. For DNA analysis, samples can be frozen at -20*C.
c. For RNA analysis, samples must be arrested instantaneously upon collection, either with
liquid nitrogen or with commercially developed “RNA preservation” chemicals.
Cell Lysis
Cells need to be lysed to release their DNA.
a. Mechanical lysis involves using physical techniques to break open the cells (smashing
them with beads, sonicating them, etc.)
b. Chemical lysis uses chemicals or enzymes to break down cell walls.
c. Some techniques can be performed directly on the sample (e.g., bead-beat a soil
sample). However, some techniques require that the cells be separated from the sample
material first (e.g., remove cells from the soil particles).
DNA/RNA Extraction
Various techniques are available. This is now mostly done with commercially developed
kits.
DNA/RNA Purification
Remove any contaminants from the DNA. Contaminants might include
carry-over proteins, organic materials, humic acids from soil, etc. Various techniques are
available.
a. Chromatography
b. Silica gel columns (common part of extraction kits)
c. PVPP spin-filter columns
d. Magnetic beads
DNA/RNA quality assessment
a. Nucleic acid quality can be assessed via:
i. Agarose gel electrophoresis
ii. Nanodrop UV spectrophotometry (A260/280)
b. Nucleic acid concentration can be assessed via:
i. Nanodrop (but highly inaccurate)
ii. Fluorescence (e.g., Qubit) – very accurate
What is the workflow for DNA sequencing
- Start with clean, pure DNA that you would like to sequence. This could be DNA from a single
organism or a whole community; it might be PCR amplicons. - Fragment the DNA into pieces that are all approximately the same length.
a. Short-read sequencers (e.g., Illumina) require, as their name implies, short reads. DNA
is fragmented into segments that are < 1,000 bp long. PCR amplicons don’t need to be
fragmented because they are already a fixed length.
b. Long-read sequencers (e.g., PacBio or Oxford Nanopore) can sequence long pieces of
DNA. Little, if any, fragmentation is required. - Attach commercially available “adapters” to the ends of the DNA fragments. These adapters are
what the sequencing machines use to find or attach to the target DNA being sequenced. - Each fragment of DNA, with the adapters attached, is called a read. The pool of all the reads
from a sample is called the library. - Load the DNA library onto the sequencer. We do not expect you to know how DNA sequencers
actually recognize different bases (A-T-C-G), and different companies use slightly different
technologies, but you should recognize the different technologies and their applications.
What is the aim of 16S rRNA gene amplicson sequencing
d to evaluate the taxonomic composition of a microbial community. It
relies on the fact that the 16S rRNA gene is a valuable taxonomic marker: this gene is highly conserved
across all bacterial taxa but contains some regions that mutate faster than the background mutation
rate. Only bacteria and archaea carry a 16S rRNA gene; however, this process has been adapted using
marker genes for other taxa (e.g., 18S rRNA gene in eukaryotes; internal transcribed spacer [ITS] genes
in fungi and plants).
What is the procedure of 16S rRNA gene amplicson sequencing
- Start with a sample of whole-community DNA (e.g., DNA extracted from soil, water, animal gut
material, etc.). - Use PCR to amplify a fragment of the 16S rRNA gene from your DNA sample. The assumption
here is that PCR amplifies genes proportionally to their abundance. For example, if Firmicutes
comprise 60% of your initial sample before amplification and Bacteroidetes comprise the other
40%, you assume that these proportions remain constant after amplification. - Sequence the amplified gene copies (typically using an Illumina MiSeq).
- Align the resulting sequences with a taxonomy reference database to determine the taxonomy
of all the sequences. - Compare species richness, diversity, and co
What are the challenges/limitation os 126S rRNA amplicon sequencing
Primer bias: Despite best efforts to develop “universal” PCR primers that amplify all sequences
equally, primers are still biased towards some sequences over others. For example, they might
preferentially amplify AT-rich sequences, causing species with AT-rich 16S rRNA genes to
“appear” more abundant than they are.
- Copy number: Some species carry multiple copies of the 16S rRNA gene (e.g., E. coli has seven
copies of the 16S rRNA gene, whereas most bacteria have only one or two). As a result, these
species will “appear” more abundant than they are. - No quantitative power: This technique does not actually quantify how many cells of a particular
species are present. We are only able to calculate relative abundances (30% of this, 21% of that).
Other tools, like qPCR or microscopy, are needed to say “there are 108
cells in this gram of soil.” - Limited taxonomic information: Because we are working with a gene fragment that is only ~300
bp long, only so much taxonomic information is available. It is difficult to identify anything to the
species level, and we are limited by how complete the reference databases are. - No functional information: Taxonomic data gives no information on what metabolic or
biogeochemical processes the community is or is capable of performing.
List other marker bases techniqes
Denaturing gradient gel electrophoresis (DGGE): Run 16S rRNA gene amplicons in a specialized
gel that separates them based on very slight sequence differences. The banding pattern on a gel
therefore gives some measure of community diversity.
* Restriction fragment length polymorphisms (RFLP): Evaluates the diversity of 16S rRNA genes
based on fragmentation pattern after being exposed to different restriction enzymes.
* Amplified ribosomal DNA restriction analysis (ARDRA): Similar to RFLP.
Non omics based approach to view fatty acids of phospholips concentration
Fatty-acid methyl ester analysis (FAME)
Phospholipid-linked fatty acid analysis (PLFA)
fluorescence in situ hybridization (FISH): fluorescent probes are attached to specific target
DNA sequences (e.g., a specific 16S rRNA gene). In this way, we can cause all cells of one species (or
group of species) to light up under a microscope. The remaining cells can be counter-stained using a
general nucleic acid stain like DAPI.
Define Genomics, Genome, Chromosome and Plasmid
Genomics is the study of the genome (i.e., all the DNA) of a single organism.
- Genome: all the genetic information in an organism, including chromosomes and plasmids
- Chromosome: the largest piece of DNA in a cell; in bacteria/archaea, chromosomes are circular
- Plasmid: accessory pieces of DNA, often containing “bonus” genes (like antibiotic resistance)
What is the procedure for sequencing genome, define read depth and coverage in process
Sequencing procedure:
1. Obtain a pure culture of a single species. Extract pure, high-quality DNA.
2. Sequence the DNA using an appropriate sequencing technology. Sequencing will produce
millions of fragments of the genome. (For example, you might get millions of 500 bp reads that
make up a 5,000,000 bp genome).
a. For sequencing, you have divided the genome into lots of smaller fragments, called
reads.
b. To ensure an accurate genome sequence, you want to get multiple reads that all cover
the same piece of DNA. This is called read depth.
c. To ensure an accurate genome sequence, you also want to make sure the entire
genome gets sequenced. However, no matter how hard you try, there might always be a
small section of the genome for which you do not obtain any reads. The percentage of
the genome that you obtain sequence for is called coverage.
What is the procedure for bioinformatic analysis
Assemble the reads into contigs. (Because reads will overlap each other, the computer simply
looks for overlapping sequences and sticks them together).
2. Assemble the contigs into scaffolds. (Based on the orientation of contigs and their relationship
with each other, the computer lines up contigs that it thinks should be together and estimates
any gaps).
3. If enough data is available, scaffolds can be connected to reassemble the genome.
4. Annotation.