Sequencing and Bioinformatics Flashcards
Read length
of bases sequenced for a DNA fragment (provided as a maximum or mean)
Read depth
of times a nt is read during sequencing. A high read depth reduces errors
Reads per run
of sequences produced per run (usually provided as a maximum)
Accuracy (sequencing)
% error rate of an instrument (usually provided as 100% - error rate);
error includes substitition, base-specific bias, etc.
Time per run
Average time for a run in an HTS (High-Throughput Sequencing) instrument
Cost per million bases
Very variant across time and geographic region
Short-read instruments
300-500 bp reads (e.g. Illumina and Ion sequencing)
Long-read instruments
>50kbp (e.g. SMRT, Nanopore)
Pyrosequencing (Microfabricated picolitre reactors)
Emulsion PCR in droplets inside beads;
Known NT’s are flowed and washed away;
Optic slide sensor camptures emitted photons when a nt is incorporated;
1 M reads occur sumultaneously (i.e., parallel sequencing);
Sequences are put together as contigs (de novo);
96% - 98% accuracy
ion torrent (non-optical semiconductor-device)
Amplification occurs on beads inside wells;
Nucleotides are flowed step-wise;
Chip detects H ions released by DNApol upon incirporating nt’s as pH shift;
Reads occur simultaneously (i.e. parallel sequencing):
99.9% accuracy
Illumina sequencing
barcodes are placed on adaptors;
libraries prepared in 94 wells;
flow cell (glass slide) with lanes containing bound oligos;
bridge amplification (forward and reverse) repeated many times;
each nt has acgaracteristic fluorescence signal (sequencing through synthesis);
base call determined by wavelenght emission and signal intesnity;
Single-Molecule Real-Time (SMRT)
a single molecule is immobilized in a nanophotonic structure;
wavelenght is detected by feluorophore laser excitation;
each dNTP has a different fluorophore and emits a didferent color;
very fast and cheap
Single-molecule nanopore DNA sequencing (Nanopore)
Does not require sample amplification;
Does not require fluorescent labelling;
ssDNA molecule passes through a protein nanopore;
an adaptor on the membrane detects ionic current passing through pore;
each nt has a different ionic current;
Sagner sequencing (Manual Sanger dideoxy chain terminator DNA sequencing)
reactions happen inside microcapillaries;
fluorecent ddNTP (H instead of OH on 3rd C of ribose) makes amplification stop;
Laser excites fluorescin on each ddNTP;
Each ddNTP has a different wavelenght;
wavelength indicates at which nt amplification stopped;
output is electropherogram;
RESCRIPt
open source language that compiles into javascript
QIIME2
“quantitative insight into microbial ecology”
Open source, commnity-developed bioinformatics pipeline
BOLD
Barcode of life datasystem;
A dataset of DNA barcode records that also ha morphological, geographic, and taxonomic data for species
sequence filtering
selection of sequences by desired characteristics like unambinguity, homopolymers, and length
sequence dereplication
deletion of duplicated sequences
naive Bayes classifier object
A probabilitic algorithm used for classifying and clustering; it’s call naive becuase it’s based on the assumption of independence, which we don’t care about in this case
Cytochrome Oxidase I
A mitochodnrial DNA sequence commonly used as a barcode; in nature it codes for a protein used in respiration
16S rRNA
encodes the ribosomal subunit, used in translation
bold R library
R package used to access sequences rom BOLD
gaps (sequencing)
nucelotides that have been removed from a sequence
.fasta
file extension for a text file contining a genetic or aminoacid sequence
metadata
file that has informaiton accompanying sequences like place collected and taxonomy info
.csv
file extension of a table converted to text; stands for “comma separated values”
IUPAC characters
16-character code that allows the specification of ambigupus nucleic acids
.qza
file extension used by QIIME
homopolymer
a sequence of repeated nucelotides
leading/trailing
first/last nucleotides in a sequence
ambiguous nucleotides
nucelotide entry in a sequence that is neither ATCG
alignment
Process to arrange sequences in order to identify regions of similarity
upstream/downstream
upstream: towards 5’
downstream: towards 3’
MAFFT
a program for performing multiple sequence alignment
sequence trimming
removing bits of sequence before and after a region of interest