Sequencing and Bioinformatics Flashcards

1
Q

Read length

A

of bases sequenced for a DNA fragment (provided as a maximum or mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Read depth

A

of times a nt is read during sequencing. A high read depth reduces errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reads per run

A

of sequences produced per run (usually provided as a maximum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Accuracy (sequencing)

A

% error rate of an instrument (usually provided as 100% - error rate);
error includes substitition, base-specific bias, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Time per run

A

Average time for a run in an HTS (High-Throughput Sequencing) instrument

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cost per million bases

A

Very variant across time and geographic region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Short-read instruments

A

300-500 bp reads (e.g. Illumina and Ion sequencing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Long-read instruments

A

>50kbp (e.g. SMRT, Nanopore)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Pyrosequencing (Microfabricated picolitre reactors)

A

Emulsion PCR in droplets inside beads;
Known NT’s are flowed and washed away;
Optic slide sensor camptures emitted photons when a nt is incorporated;
1 M reads occur sumultaneously (i.e., parallel sequencing);
Sequences are put together as contigs (de novo);
96% - 98% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ion torrent (non-optical semiconductor-device)

A

Amplification occurs on beads inside wells;
Nucleotides are flowed step-wise;
Chip detects H ions released by DNApol upon incirporating nt’s as pH shift;
Reads occur simultaneously (i.e. parallel sequencing):
99.9% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Illumina sequencing

A

barcodes are placed on adaptors;
libraries prepared in 94 wells;
flow cell (glass slide) with lanes containing bound oligos;
bridge amplification (forward and reverse) repeated many times;
each nt has acgaracteristic fluorescence signal (sequencing through synthesis);
base call determined by wavelenght emission and signal intesnity;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Single-Molecule Real-Time (SMRT)

A

a single molecule is immobilized in a nanophotonic structure;
wavelenght is detected by feluorophore laser excitation;
each dNTP has a different fluorophore and emits a didferent color;
very fast and cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Single-molecule nanopore DNA sequencing (Nanopore)

A

Does not require sample amplification;
Does not require fluorescent labelling;
ssDNA molecule passes through a protein nanopore;
an adaptor on the membrane detects ionic current passing through pore;
each nt has a different ionic current;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sagner sequencing (Manual Sanger dideoxy chain terminator DNA sequencing)

A

reactions happen inside microcapillaries;
fluorecent ddNTP (H instead of OH on 3rd C of ribose) makes amplification stop;
Laser excites fluorescin on each ddNTP;
Each ddNTP has a different wavelenght;
wavelength indicates at which nt amplification stopped;
output is electropherogram;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RESCRIPt

A

open source language that compiles into javascript

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

QIIME2

A

“quantitative insight into microbial ecology”
Open source, commnity-developed bioinformatics pipeline

17
Q

BOLD

A

Barcode of life datasystem;
A dataset of DNA barcode records that also ha morphological, geographic, and taxonomic data for species

18
Q

sequence filtering

A

selection of sequences by desired characteristics like unambinguity, homopolymers, and length

19
Q

sequence dereplication

A

deletion of duplicated sequences

20
Q

naive Bayes classifier object

A

A probabilitic algorithm used for classifying and clustering; it’s call naive becuase it’s based on the assumption of independence, which we don’t care about in this case

21
Q

Cytochrome Oxidase I

A

A mitochodnrial DNA sequence commonly used as a barcode; in nature it codes for a protein used in respiration

22
Q

16S rRNA

A

encodes the ribosomal subunit, used in translation

23
Q

bold R library

A

R package used to access sequences rom BOLD

24
Q

gaps (sequencing)

A

nucelotides that have been removed from a sequence

25
Q

.fasta

A

file extension for a text file contining a genetic or aminoacid sequence

26
Q

metadata

A

file that has informaiton accompanying sequences like place collected and taxonomy info

27
Q

.csv

A

file extension of a table converted to text; stands for “comma separated values”

28
Q

IUPAC characters

A

16-character code that allows the specification of ambigupus nucleic acids

29
Q

.qza

A

file extension used by QIIME

30
Q

homopolymer

A

a sequence of repeated nucelotides

31
Q

leading/trailing

A

first/last nucleotides in a sequence

32
Q

ambiguous nucleotides

A

nucelotide entry in a sequence that is neither ATCG

33
Q

alignment

A

Process to arrange sequences in order to identify regions of similarity

34
Q

upstream/downstream

A

upstream: towards 5’
downstream: towards 3’

35
Q

MAFFT

A

a program for performing multiple sequence alignment

36
Q

sequence trimming

A

removing bits of sequence before and after a region of interest