Lecture 7: Protein Domain Structure: Comparison and Classification Flashcards
What is the general idea of a structural domain?
A segment within the protein that folds and can exist independently. Multiple domains are able to make up proteins.
What is a protein domain superfamily?
A group of protein domains that share a common evolutionary ancestor
Within superfamilies, how much of the structure is conserved?
At least 30-40% of structure is conserved.
What are the challenges when comparing protein structures?
Residue substitutions due to mutations.
Insertion and Deletions of residues.
While the core is usually conserved, there can still be significant differences outside the core.
What is the simplest way to score structural similarity?
Using the Root Mean Square Deviation (RMSD).
How do you get around insertions and deletions?
Only compare secondary structure, ignore the variable loop regions.
Use algorithms that explicitly handle insertions and deletions.
What are methods you can use to find secondary structure similarity?
Graph Theory
Contact Maps
How is graph theory used to score structural similarity?
Secondary structures are represented using nodes.
The edges represent distance and angles.
What are contact maps and how are they used to score structural similarity?
Contact maps are essentially distance maps that represent how far each residue is from other residues in the protein’s folded state.
Sections of contact maps for different proteins are then used as comparison metrics between them.
An RMSD score is output from this comparison, which is used as a scoring metric.
What is Foldseek and what are its basic principles?
Foldseek reduces a 3D structure to a 1D sequence.
It does this by defining an alphabet that describes 3D interactions.
Once sequence has been created, regular sequence comparison algorithms are used for comparison.
What is CATH?
It is a domain classification database.
CATH stands for Class, Architecture, Topology, Homologous Superfamily.
It has classified ~600,000 domains into ~5,000 domain superfamily.
What are the different classes?
1.Mostly Alpha
2.Mostly Beta
3.Mixed
4.Few Secondary Structures
How many architectures exist in CATH?
40+ (~41)
What is divergent evolution and convergent evolution?
Divergent evolution is when domains share the same ancestor.
Convergent evolution is where domains/proteins have the same shape because it was advantageous to have it.