bioinformatic Flashcards
What are domains? What do domain families have in common?
Basic evolutionary units of proteins which combine together to enhance/extend functional repitoire.
All members in the same domain family come from a common ancestor and share a common / similar function
domains also have:
- similar AA sequences, sequence identidy decreases with divergence
- pairs of very closely related proteins ahve similar DNA sequences (only good for short evolutionary distances)
- proteins in the same domain family have the same fold
How do we classify domains?
crteate database of multiple sequence alignments –> use these to get SEED alignment (similarity)
What is the HMM?
hidden markov model is the defining characteristic of your protein; it gives the probability of every AA in the sequence
You manually create the SEED alignment, then you automatically get the progile-HMM and the FULL alignment on PFAM.
The E value relates to the expected alignment of your sequence with the HMM
It is predicted that there are only ~10(power of 2-3) folds in nature. What is the reason for this structural simialrity amongst proteins?
- similarity in proteins due to divergence from a common ancestor
- convergent evolution and a limited no of folds
match the following:
divergent evolution, convergent evolution
with:
folds, chance
divergent = folds convergent = chance
What does domain structure database CATH stand for? what is it?
A heirarchy for classifying prfoteins according to:
C - class (secondary structure)
A- achitecture (shape of the fold)
Topology (fold connectivity)
Homology (evolutionary relationships between common ancestors in the same domain)
Why might there be discrepancies between these 3 domains:
- PFAM superfamily
- CATH number
- SCOP superfamily
- differences in domain definitions
2. differen meth9ods used to group domais into families/superfamilies/folds