1. Navigating Information About Nucleotide and Protein Sequences Flashcards
What is the difference between Bioinformatics and Computational Biology?
Bioinformatics: the goal is to build useful tools to analyze biological data - engineering
Computational biology: the goal is to learn new biology by using computational techniques - science
What does “demultiplex” mean?
When sequencing many samples at the same time, we obtain multiple signals at the same time. A computer differentiates these signals, allowing us to know what sequence comes from what sample. This process is demultiplexing
A higher Q score indicates a ( greater / smaller ) probability of error.
smaller
What is a FASTA file?
Text-based format for representing either nucleotide sequences or amino acid sequences in which nucleotides or amino acids are represented using single-letter codes.
Components of a FASTA file (each entry in a FASTA file consists of…)
- ”>” followed by a sequence identifier with the definition or description of the sequence
- Lines of sequence data
What is an accession number?
Unique identifier for a sequence record
1 letter + 5 numbers or 2 letters + 6 numbers
What is a flat file?
During sequence submission, the submitter has to provide the name of the sequence, the source, annotation, ORF, sequence and translation product. All of this is displayed in a flat file.
Types of motif representations
Regular expressions (regex)
Profiles (matrices)
Logos
Regular expression for motifs (definition)
Define a unique sequence pattern using the standard IUPAC one-letter amino acid code, allowing for ambiguities
Profile for motifs (definition)
Consist of a position-weighted matrix where each position of the motif receives a score (probability) for each amino acid and position
Sequence logo for motifs (definition)
Graphical display of a multiple sequence alignment consisting of color-coded stacks of letters representing amino acids at successive points
What is a RefSeq?
It is the reference sequence for a given gene or protein
What do each of the following accession numbers correspond to? (molecule types)
NC_123456
NG_123456
NM_123456
NR_123456
NP_123456
Complete genomic molecules (genomes, chromosomes…)
Incomplete genomic region (gDNA for a gene)
mRNA
Non-coding RNA
Protein
What is a genome browser?
A program that provides a grahical interface for users to browse, searh, retrieve and analyze genomic sequence and annotation data
Most commonly used genome browsers (list)
UCSC Genome Browser
NCBI Genome Data Viewer