Biodatabases Flashcards
things to censider when compiling a database
- How can other people access the data
- How can the information be used
- What to do with all the data
where is there a list of biological databases
Wikipedia
example of a genome browser
Ensembl
What is BLAST
Basic Local Alignment Search Tool
- most used program in computational biology
- access to many different databases. major databases are nucleotide, nr (non redundant protein database), genome databases.
in BLAST, what are bit score and expected match?
Bit score - Independent of database size, how good the match is between 2 sequences
Expected match - liklihood of finding this match, by chance.
in BLAST report, what does the alignment line show
shows AA sequence for both DNA sequences.
Highlights differences between sequences, and puts + if they are different AA but same protein.
in BLAST, what is the positives score
% of functionally similar positions in the sequence.
5 BLAST algorithms
BLASTP - protein -> protein database
BLASTN - DNA -> DNA database
BLASTX - DNA, translated -> Protein database
TBLASTN -> protein -> Translated DNA database
TBLASTX - DNA, translated -> translated DNA database
which BLAST algorithm has the highest query search?
tblastx
What is UniProt
Universal Protein Resource
Listed on NCBI but also has its own website
Checked computationally
What is UniProt/TrEMBL
contains protein sequences associated with computationally generated annotation.
Unreviewed
UniProt/Swiss-Prot
high quality manually annotated, non redundant protein database. Reveiwed.
Databases for species
many species have their own database Flybase - drosophila Zfin - zebrafish genome Wormbase - C elegans Reef genomics
why are 16S sequences used
16S gene - encodes 16S rRNA.
used to make phylogenies because of slow rates of evolution in this gene
what is MG-RAST
metagenomics anaysis server