4. Databases and comparisons Flashcards

Question 1

Q

Databases with protein data

Answer

A

UniProt
InterPro
Pride
PDB

Question 2

Q

Databases for genome and genes

Question 3

Q

Databases of sequence info

Answer

A

Uniprot/swissprot
uniprot/trembl
EMBL (nucleotide sequence)
Genbank

Question 4

Q

Swissprot

Answer

A

high level of annotation eg function, domains, PST
minimal level of redundancy
quality of annotation

Question 5

Q

EMBL (embl-ebi)

Answer

A

have eg emboss needle, interpro etc links
sequences submitted directly by scientists
literature and patterns
little error checking

Question 6

Q

Ensembl

Answer

A

useful when analysing genomes

have gene and find mammalian homologes or eg all known SNPs

Question 7

Q

why sequence comparison

Answer

A

identification of protein
search for homologies
evolution

Question 8

Q

Methods for sequence comparisons

Answer

A

Diagonal plot
BLAST
FASTA

Question 9

Q

FASTA

Answer

A

use scoring matrices
use k-tuples
- look at more than one residue at the time
hashing
- make dictionary/table of k-tuples
find clusters of k-tuples
- then you do dynamic programming

Question 10

Q

BLAST

Answer

A

faster than FASTA, also more sensitive now a days
use “words” instead of k-tuples
- instead of identical hits you use score threshold
generally it looks at longer sequences

Question 11

Q

E-value

Answer

A

a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. Essentially, the E value describes the random background noise. The lower the E-value, or the closer it is to zero, the more ”significant” the match is.

4. Databases and comparisons Flashcards

(11 cards)