14-16: Bioinformatics Flashcards

1
Q

List all the steps of nucleic acid extraction

A
  1. Collect high-quality samples
  2. Cell lysis
  3. DNA extraction
  4. DNA/RNA extraction
  5. DNA/RNA quality assessment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Collect High Quality Samples

A

Good DNA/RNA comes from good samples, free of contamination.
Samples should be appropriately preserved prior to extraction.

a. Best practice is to snap-freeze samples in liquid nitrogen.
b. For DNA analysis, samples can be frozen at -20*C.
c. For RNA analysis, samples must be arrested instantaneously upon collection, either with
liquid nitrogen or with commercially developed “RNA preservation” chemicals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cell Lysis

A

Cells need to be lysed to release their DNA.
a. Mechanical lysis involves using physical techniques to break open the cells (smashing
them with beads, sonicating them, etc.)
b. Chemical lysis uses chemicals or enzymes to break down cell walls.
c. Some techniques can be performed directly on the sample (e.g., bead-beat a soil
sample). However, some techniques require that the cells be separated from the sample
material first (e.g., remove cells from the soil particles).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DNA/RNA Extraction

A

Various techniques are available. This is now mostly done with commercially developed
kits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DNA/RNA Purification

A

Remove any contaminants from the DNA. Contaminants might include
carry-over proteins, organic materials, humic acids from soil, etc. Various techniques are
available.
a. Chromatography
b. Silica gel columns (common part of extraction kits)
c. PVPP spin-filter columns
d. Magnetic beads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DNA/RNA quality assessment

A

a. Nucleic acid quality can be assessed via:
i. Agarose gel electrophoresis
ii. Nanodrop UV spectrophotometry (A260/280)
b. Nucleic acid concentration can be assessed via:
i. Nanodrop (but highly inaccurate)
ii. Fluorescence (e.g., Qubit) – very accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the workflow for DNA sequencing

A
  1. Start with clean, pure DNA that you would like to sequence. This could be DNA from a single
    organism or a whole community; it might be PCR amplicons.
  2. Fragment the DNA into pieces that are all approximately the same length.
    a. Short-read sequencers (e.g., Illumina) require, as their name implies, short reads. DNA
    is fragmented into segments that are < 1,000 bp long. PCR amplicons don’t need to be
    fragmented because they are already a fixed length.
    b. Long-read sequencers (e.g., PacBio or Oxford Nanopore) can sequence long pieces of
    DNA. Little, if any, fragmentation is required.
  3. Attach commercially available “adapters” to the ends of the DNA fragments. These adapters are
    what the sequencing machines use to find or attach to the target DNA being sequenced.
  4. Each fragment of DNA, with the adapters attached, is called a read. The pool of all the reads
    from a sample is called the library.
  5. Load the DNA library onto the sequencer. We do not expect you to know how DNA sequencers
    actually recognize different bases (A-T-C-G), and different companies use slightly different
    technologies, but you should recognize the different technologies and their applications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the aim of 16S rRNA gene amplicson sequencing

A

d to evaluate the taxonomic composition of a microbial community. It
relies on the fact that the 16S rRNA gene is a valuable taxonomic marker: this gene is highly conserved
across all bacterial taxa but contains some regions that mutate faster than the background mutation
rate. Only bacteria and archaea carry a 16S rRNA gene; however, this process has been adapted using
marker genes for other taxa (e.g., 18S rRNA gene in eukaryotes; internal transcribed spacer [ITS] genes
in fungi and plants).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the procedure of 16S rRNA gene amplicson sequencing

A
  1. Start with a sample of whole-community DNA (e.g., DNA extracted from soil, water, animal gut
    material, etc.).
  2. Use PCR to amplify a fragment of the 16S rRNA gene from your DNA sample. The assumption
    here is that PCR amplifies genes proportionally to their abundance. For example, if Firmicutes
    comprise 60% of your initial sample before amplification and Bacteroidetes comprise the other
    40%, you assume that these proportions remain constant after amplification.
  3. Sequence the amplified gene copies (typically using an Illumina MiSeq).
  4. Align the resulting sequences with a taxonomy reference database to determine the taxonomy
    of all the sequences.
  5. Compare species richness, diversity, and co
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the challenges/limitation os 126S rRNA amplicon sequencing

A

Primer bias: Despite best efforts to develop “universal” PCR primers that amplify all sequences
equally, primers are still biased towards some sequences over others. For example, they might
preferentially amplify AT-rich sequences, causing species with AT-rich 16S rRNA genes to
“appear” more abundant than they are.

  • Copy number: Some species carry multiple copies of the 16S rRNA gene (e.g., E. coli has seven
    copies of the 16S rRNA gene, whereas most bacteria have only one or two). As a result, these
    species will “appear” more abundant than they are.
  • No quantitative power: This technique does not actually quantify how many cells of a particular
    species are present. We are only able to calculate relative abundances (30% of this, 21% of that).
    Other tools, like qPCR or microscopy, are needed to say “there are 108
    cells in this gram of soil.”
  • Limited taxonomic information: Because we are working with a gene fragment that is only ~300
    bp long, only so much taxonomic information is available. It is difficult to identify anything to the
    species level, and we are limited by how complete the reference databases are.
  • No functional information: Taxonomic data gives no information on what metabolic or
    biogeochemical processes the community is or is capable of performing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

List other marker bases techniqes

A

Denaturing gradient gel electrophoresis (DGGE): Run 16S rRNA gene amplicons in a specialized
gel that separates them based on very slight sequence differences. The banding pattern on a gel
therefore gives some measure of community diversity.
* Restriction fragment length polymorphisms (RFLP): Evaluates the diversity of 16S rRNA genes
based on fragmentation pattern after being exposed to different restriction enzymes.
* Amplified ribosomal DNA restriction analysis (ARDRA): Similar to RFLP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Non omics based approach to view fatty acids of phospholips concentration

A

Fatty-acid methyl ester analysis (FAME)

Phospholipid-linked fatty acid analysis (PLFA)

fluorescence in situ hybridization (FISH): fluorescent probes are attached to specific target
DNA sequences (e.g., a specific 16S rRNA gene). In this way, we can cause all cells of one species (or
group of species) to light up under a microscope. The remaining cells can be counter-stained using a
general nucleic acid stain like DAPI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define Genomics, Genome, Chromosome and Plasmid

A

Genomics is the study of the genome (i.e., all the DNA) of a single organism.

  • Genome: all the genetic information in an organism, including chromosomes and plasmids
  • Chromosome: the largest piece of DNA in a cell; in bacteria/archaea, chromosomes are circular
  • Plasmid: accessory pieces of DNA, often containing “bonus” genes (like antibiotic resistance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the procedure for sequencing genome, define read depth and coverage in process

A

Sequencing procedure:
1. Obtain a pure culture of a single species. Extract pure, high-quality DNA.
2. Sequence the DNA using an appropriate sequencing technology. Sequencing will produce
millions of fragments of the genome. (For example, you might get millions of 500 bp reads that
make up a 5,000,000 bp genome).
a. For sequencing, you have divided the genome into lots of smaller fragments, called
reads.
b. To ensure an accurate genome sequence, you want to get multiple reads that all cover
the same piece of DNA. This is called read depth.
c. To ensure an accurate genome sequence, you also want to make sure the entire
genome gets sequenced. However, no matter how hard you try, there might always be a
small section of the genome for which you do not obtain any reads. The percentage of
the genome that you obtain sequence for is called coverage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the procedure for bioinformatic analysis

A

Assemble the reads into contigs. (Because reads will overlap each other, the computer simply
looks for overlapping sequences and sticks them together).
2. Assemble the contigs into scaffolds. (Based on the orientation of contigs and their relationship
with each other, the computer lines up contigs that it thinks should be together and estimates
any gaps).
3. If enough data is available, scaffolds can be connected to reassemble the genome.
4. Annotation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is annotation

A

Annotation. Once the genome has been assembled, different software can be used to annotate
the genome: identify which genes code for what.
a. Software can easily identify rRNA and tRNA genes because they are highly conserved.
b. Other software can predict open reading frames (i.e., protein-coding genes) by looking
for start/stop codons.
c. In eukaryotes, additional software is needed to detect and account for introns.
d. Protein-coding gene or amino acid sequences can then be compared to available
databases. Function can be predicted based on homology.

17
Q

What are challenges in genomics

A

Low depth or coverage. If your sequencing efforts produce regions of the genome with low
sequencing depth, or you have low overall coverage, it is difficult for genome assembly software
to turn your reads into genomes.
* Proteins of unknown function. Annotating genomes based on homology with other genomes or
known sequences is inherently limited – there are many proteins with unknown functions!

18
Q

Metagenomics

A

sequencing genomes of all the organisms in an environmental sample

19
Q

What are two major adv of metagenome seuqnecng over 16S

A

(1) you obtain a taxonomic and functional profile, because you are including all protein-coding genes;
and (2) you can detect genes not amplified by current PCR primers.

20
Q

What are similarites between 16s and metagenome

A

DNA prep + sequencing + assembly to contigs + preduction + annotation

21
Q

Link metagenomics to taxonomic and functional genomics

A

A researcher can then explore the gene content of the contigs to determine the taxonomic diversity of
the sample (using marker genes, like the 16S rRNA gene) and the functional diversity (based on the
functional genes annotated in the sample).

22
Q

Define metagenome assembled genomes, what does it involve

A

Additional software can be used to try and isolate individual genomes from the DNA pool, . This process is called binning because the
software essentially takes contigs and puts them into different “bins” if they were likely to come from
the same genome. Bins can then be assembled and annotated the same way you would assemble a
genome. Genome binning is a powerful tool for studying unculturable microorganisms.

23
Q

Define stable isotope probing + steps

A

Stable isotope probing is a DNA or RNA-based technique that allows us to distinguish the DNA of active
cells from inactive cells.
1. Prior to nucleic acid extraction, cells (or an environmental sample, such as soil or water) are
incubated with a 13C-labeled substrate. Depending on your research question, you might label
cells using a general heterotrophic substrate (like 13C-labeled glucose), a specific substrate of
interest (like a 13C-labeled hydrocarbon to look at oil-degrading bacteria), or something for
autotrophs (like 13C-labeled CO2).
2. Cells that are actively growing and dividing will incorporate 13C into their DNA and RNA, making
their nucleic acids heavier than inactive cells.
3. After DNA/RNA extraction, “heavy” nucleic acids can be separated from the “light” nucleic acids
using density gradient centrifugation. The researcher can then sequence the “heavy” and
“light” fractions separately

24
Q

Defne metatranscriptoms, RNA requirment?

A

metatranscriptomics is the study of all RNA in a whole microbial community. These techniques reveal
what genes are actually being transcribed, and thus give insight into what a cell/community is doing
rather than just what a cell/community can do/
The RNA needs to be reverse transcribed into DNA to be sequenced – because RNA is
much less stable than DNA, RNA cannot be loaded directly onto a sequencing machine.

25
Q

Define metaproteomics

A

The study of all proteins in either a single organism grown in pure culture (proteomics) or a whole
community of cells (metaproteomics). Proteins are extracted and then analyzed via a combination of
electrophoresis and mass spectrometry. These techniques are arguably a more direct measure of
community activity than transcriptomics, because some RNA molecules are not always translated into
proteins. However, proteomic techniques are much more difficult and expensive.

26
Q

Define metametabolomics

A

The study of all metabolites (not just proteins). This can include sugars, nucleotides, proteins, and lipids.
The value of these techniques are similar to proteomics: you are directly measuring the “output” of a
cell or community.

27
Q

What is microfuilics

A

w allow us to isolate single cells from a sample and then sequence the
genome of that single cells. This saves some of the bioinformatic challenges involved in binning
metagenomes: instead of sequencing all the DNA in a community and then trying to computationally
determine the genomes afterwards, we can now isolate the cells first and then sequence their genomes
individually. T

28
Q

Name microfluidic single cell genomics steps

A
  1. Isolate single cells (often using fluorescence-activated cell sorting [FACS] or some other type of
    microfluidics system).
    a. These two techniques isolate single cells. A solution containing suspended cells is
    essentially fed into a flow cytometer a few microliters at a time. The flow cytometer
    uses a laser to identify the size/shape of any particles in the solution. If that particle is a
    cell, it gets isolated into its own well or tube for DNA extraction.
  2. Multiple displacement amplification (MDA)
    a. “Random” PCR primers are used to amplify the entire genome. These primers (more
    than two) are specifically designed to bind to random sites on genomes to ensure that
    the entire genome gets amplified.
  3. Sequencing.
    a. The amplified genome can be sequenced and analyzed like any other genome.