1. Navigating Information About Nucleotide and Protein Sequences Flashcards

Question 1

Q

What is the difference between Bioinformatics and Computational Biology?

Answer

A

Bioinformatics: the goal is to build useful tools to analyze biological data - engineering
Computational biology: the goal is to learn new biology by using computational techniques - science

Question 2

Q

What does “demultiplex” mean?

Answer

A

When sequencing many samples at the same time, we obtain multiple signals at the same time. A computer differentiates these signals, allowing us to know what sequence comes from what sample. This process is demultiplexing

Question 3

Q

A higher Q score indicates a ( greater / smaller ) probability of error.

Question 4

Q

What is a FASTA file?

Answer

A

Text-based format for representing either nucleotide sequences or amino acid sequences in which nucleotides or amino acids are represented using single-letter codes.

Question 5

Q

Components of a FASTA file (each entry in a FASTA file consists of…)

Answer

A

”>” followed by a sequence identifier with the definition or description of the sequence
Lines of sequence data

Question 6

Q

What is an accession number?

Answer

A

Unique identifier for a sequence record
1 letter + 5 numbers or 2 letters + 6 numbers

Question 7

Q

What is a flat file?

Answer

A

During sequence submission, the submitter has to provide the name of the sequence, the source, annotation, ORF, sequence and translation product. All of this is displayed in a flat file.

Question 8

Q

Types of motif representations

Answer

A

Regular expressions (regex)
Profiles (matrices)
Logos

Question 9

Q

Regular expression for motifs (definition)

Answer

A

Define a unique sequence pattern using the standard IUPAC one-letter amino acid code, allowing for ambiguities

Question 10

Q

Profile for motifs (definition)

Answer

A

Consist of a position-weighted matrix where each position of the motif receives a score (probability) for each amino acid and position

Question 11

Q

Sequence logo for motifs (definition)

Answer

A

Graphical display of a multiple sequence alignment consisting of color-coded stacks of letters representing amino acids at successive points

Question 12

Q

What is a RefSeq?

Answer

A

It is the reference sequence for a given gene or protein

Question 13

Q

What do each of the following accession numbers correspond to? (molecule types)
NC_123456
NG_123456
NM_123456
NR_123456
NP_123456

Answer

A

Complete genomic molecules (genomes, chromosomes…)
Incomplete genomic region (gDNA for a gene)
mRNA
Non-coding RNA
Protein

Question 14

Q

What is a genome browser?

Answer

A

A program that provides a grahical interface for users to browse, searh, retrieve and analyze genomic sequence and annotation data

Question 15

Q

Most commonly used genome browsers (list)

Answer

A

UCSC Genome Browser
NCBI Genome Data Viewer

Question 16

Q

What is a BCL file?

Answer

Study These Flashcards

A

It’s the file where base calls are stored for clusters in a sequence by synthesis process

Question 17

Q

What is a FASTQ file?

Answer

Study These Flashcards

A

Text file that contains the sequence data from the clusters that pass the filter in a flow cell.
It contains a sequence identifier, the sequence, a separator (+ sign) and the base call quality score (Q)

Question 18

Q

Base call quality score (Q) (definition)

Answer

Study These Flashcards

A

Often represented using ASCII character, it provides information regarding the reliability of a base call
Q = -10log10(e), where e is the estimated probability of the base call being wrong