CATH Flashcards
When was CATH established and what is the name of the other database? What is the purpose of CATH?
CATH was established in the 1990’s along with another domain database- SCOP. It was created by Christine Orengo and colleagues and is a free open resource to classify protein domains.
At the time when CATH and SCOP were first published, there were only 3000 solved protein structures. In 2015 there were over 100,000.
What is a domain?
Large proteins are comprised of smaller, recognisable sequence domains which recur in other proteins in various combinations. Can have a sequence domain or a structural domain. Proteins usually consist of two or more domains connected by one or more connections. Domains usually consist of secondary structures or supersecondary structure and are around 150 +/- 50 residues in legth.
How does CATH organise domains?
Domains are organised based on 4 criteria:
Class- domains are classified by their secondary structure and usually fit into one of four groups; mostly alpha helica, mostly beta sheet, alternating alpha and beta or alpha plus beta
Architecture- domains are organised by their general orientations of secondary structures
Topology- domains are organised by the connectivity between the domains
Homology- domains are looked at in terms of structure and sequence similarity and can be fitted in to homologous families at this stage if it is though they are evolutionarily related to another protein family
What other means are proteins classified in CATH?
SOLID is another means by which domain structure an be identified. The first four criteria look at sequence similarity and overlap and the D criteria gives each domain a unique marker.
What are the challenges associated with CATH?
Challenges associated are providing consistent and accurate domain boundaries- interdomain boundaries can be difficult to visualise
Describe the computational methods that can be used to identify domains in proteins?
There are two main databases- CATH and SCOP and these use the sequences of domains and their structure to classify them in terms of the criteria described. Databases will screen for both sequence similarity and amino acid similarity.
For domains which are not sequentially similar, structural alignment can be used by algorithms to try and match the struture.
The alignment can then rate the protein with a matrix score.