Sequencing and Bioinformatics Flashcards by Aldo Rios

Read length

of bases sequenced for a DNA fragment (provided as a maximum or mean)

How well did you know this?

Not at all

Perfectly

Read depth

of times a nt is read during sequencing. A high read depth reduces errors

How well did you know this?

Not at all

Perfectly

Reads per run

of sequences produced per run (usually provided as a maximum)

How well did you know this?

Not at all

Perfectly

Accuracy (sequencing)

% error rate of an instrument (usually provided as 100% - error rate);
error includes substitition, base-specific bias, etc.

How well did you know this?

Not at all

Perfectly

Time per run

Average time for a run in an HTS (High-Throughput Sequencing) instrument

How well did you know this?

Not at all

Perfectly

Cost per million bases

Very variant across time and geographic region

How well did you know this?

Not at all

Perfectly

Short-read instruments

300-500 bp reads (e.g. Illumina and Ion sequencing)

How well did you know this?

Not at all

Perfectly

Long-read instruments

>50kbp (e.g. SMRT, Nanopore)

How well did you know this?

Not at all

Perfectly

Pyrosequencing (Microfabricated picolitre reactors)

Emulsion PCR in droplets inside beads;
Known NT’s are flowed and washed away;
Optic slide sensor camptures emitted photons when a nt is incorporated;
1 M reads occur sumultaneously (i.e., parallel sequencing);
Sequences are put together as contigs (de novo);
96% - 98% accuracy

How well did you know this?

Not at all

Perfectly

ion torrent (non-optical semiconductor-device)

Amplification occurs on beads inside wells;
Nucleotides are flowed step-wise;
Chip detects H ions released by DNApol upon incirporating nt’s as pH shift;
Reads occur simultaneously (i.e. parallel sequencing):
99.9% accuracy

How well did you know this?

Not at all

Perfectly

Illumina sequencing

barcodes are placed on adaptors;
libraries prepared in 94 wells;
flow cell (glass slide) with lanes containing bound oligos;
bridge amplification (forward and reverse) repeated many times;
each nt has acgaracteristic fluorescence signal (sequencing through synthesis);
base call determined by wavelenght emission and signal intesnity;

How well did you know this?

Not at all

Perfectly

Single-Molecule Real-Time (SMRT)

a single molecule is immobilized in a nanophotonic structure;
wavelenght is detected by feluorophore laser excitation;
each dNTP has a different fluorophore and emits a didferent color;
very fast and cheap

How well did you know this?

Not at all

Perfectly

Single-molecule nanopore DNA sequencing (Nanopore)

Does not require sample amplification;
Does not require fluorescent labelling;
ssDNA molecule passes through a protein nanopore;
an adaptor on the membrane detects ionic current passing through pore;
each nt has a different ionic current;

How well did you know this?

Not at all

Perfectly

Sagner sequencing (Manual Sanger dideoxy chain terminator DNA sequencing)

reactions happen inside microcapillaries;
fluorecent ddNTP (H instead of OH on 3rd C of ribose) makes amplification stop;
Laser excites fluorescin on each ddNTP;
Each ddNTP has a different wavelenght;
wavelength indicates at which nt amplification stopped;
output is electropherogram;

How well did you know this?

Not at all

Perfectly

RESCRIPt

open source language that compiles into javascript

How well did you know this?

Not at all

Perfectly

QIIME2

Study These Flashcards

“quantitative insight into microbial ecology”
Open source, commnity-developed bioinformatics pipeline

BOLD

Study These Flashcards

Barcode of life datasystem;
A dataset of DNA barcode records that also ha morphological, geographic, and taxonomic data for species

sequence filtering

Study These Flashcards

selection of sequences by desired characteristics like unambinguity, homopolymers, and length

sequence dereplication

Study These Flashcards

deletion of duplicated sequences

naive Bayes classifier object

Study These Flashcards

A probabilitic algorithm used for classifying and clustering; it’s call naive becuase it’s based on the assumption of independence, which we don’t care about in this case

Cytochrome Oxidase I

Study These Flashcards

A mitochodnrial DNA sequence commonly used as a barcode; in nature it codes for a protein used in respiration

16S rRNA

Study These Flashcards

encodes the ribosomal subunit, used in translation

bold R library

Study These Flashcards

R package used to access sequences rom BOLD

gaps (sequencing)

Study These Flashcards

nucelotides that have been removed from a sequence

.fasta

file extension for a text file contining a genetic or aminoacid sequence

metadata

file that has informaiton accompanying sequences like place collected and taxonomy info

.csv

file extension of a table converted to text; stands for "comma separated values"

IUPAC characters

16-character code that allows the specification of ambigupus nucleic acids

.qza

file extension used by QIIME

homopolymer

a sequence of repeated nucelotides

leading/trailing

first/last nucleotides in a sequence

ambiguous nucleotides

nucelotide entry in a sequence that is neither ATCG

alignment

Process to arrange sequences in order to identify regions of similarity

upstream/downstream

upstream: towards 5' downstream: towards 3'

MAFFT

a program for performing multiple sequence alignment

sequence trimming

removing bits of sequence before and after a region of interest

Sequencing and Bioinformatics Flashcards

(36 cards)