4. Databases and comparisons Flashcards
1
Q
Databases with protein data
A
UniProt
InterPro
Pride
PDB
2
Q
Databases for genome and genes
A
Ensembl
3
Q
Databases of sequence info
A
Uniprot/swissprot
uniprot/trembl
EMBL (nucleotide sequence)
Genbank
4
Q
Swissprot
A
- high level of annotation eg function, domains, PST
- minimal level of redundancy
- quality of annotation
5
Q
EMBL (embl-ebi)
A
- have eg emboss needle, interpro etc links
- sequences submitted directly by scientists
- literature and patterns
- little error checking
6
Q
Ensembl
A
- useful when analysing genomes
have gene and find mammalian homologes or eg all known SNPs
7
Q
why sequence comparison
A
- identification of protein
- search for homologies
- evolution
8
Q
Methods for sequence comparisons
A
- Diagonal plot
- BLAST
- FASTA
9
Q
FASTA
A
- use scoring matrices
- use k-tuples
- look at more than one residue at the time
- hashing
- make dictionary/table of k-tuples
- find clusters of k-tuples
- then you do dynamic programming
10
Q
BLAST
A
- faster than FASTA, also more sensitive now a days
- use “words” instead of k-tuples
- instead of identical hits you use score threshold
- generally it looks at longer sequences
11
Q
E-value
A
a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. Essentially, the E value describes the random background noise. The lower the E-value, or the closer it is to zero, the more ”significant” the match is.