Protein Structural Bioinformatics Flashcards
SCOP
scop
structural classification of protein
first classification software
How can proteins be classified?
By secondary structure:
a-helical - Secondary structure exclusively or almost exclusively a-helical
Beta-sheet - Secondary structure exclusively or almost exclusively beta-sheet
a+B - a-helices and beta-sheets separated in different parts of the molecule, absence of beta-alpha-beta super secondary structure
a/B - Helices and sheets assembled from beta-alpha-beta units
a/B-linear - Line through centers of strands of sheets roughly linear
a/B-Barrels - Line through centres of strands of sheet roughly circular
The SCOP database
Structural classes:
- Folds
- Superfamilies
-Families
Small proteins:
- Cystine-knot cytokines
- Cystine-knot cytokines
- Transforming growth factor beta
What is used to define domains in proteins?
The Gō plot
- Calculate radius of spherical volume of protein
- Calculate disease from each alpha carbon of each amino acid to all the others
- If the distance is greater than the spherical radius, score “+”
Disadvantages of the Gō method
- Requires solved structure
- Domain boundaries not always clear
- Gō method now superseded by sequence-based algorithms
How Pfam builds domains
- Start with a high quality protein structure (X-ray crystallography, good resolution, low Å)
- BLAST PDB to find related protein structures
- Align these – maximise structural homology (meaning adjust alignment so that boundaries of secondary structural elements match)
- Build a statistical profile (Hidden Markov Model – HMM) of the “seed” alignment
How Pfam builds domains
- Use the HMM to query GenPept – hmmsearch
- Align the new hits to the HMM – hmmalign
- Rebuild the HMM to include the new hits – hmmbuild
- Repeat as desired, or until there are no new hits
- “Structure, structure, structure” (Alex Bateman, founder of Pfam)
Disadvantages of Pfam
Each domain is defined by a HMM, and that HMM is only as good as the “seed” alignment used to construct it
Because the HMM building process is iterative, errors can be magnified
There are now so many domains in Pfam, that curation is uneven
Pfam was designed to support the Human Genome Project, and viruses were under-represented
Homstrad meaning
HOMologous STRucture Alignment Database
Not enough structures
Structure determination is far harder than sequencing
Illumina and Minion sequencing have made sequencing ultra high throughput
There is no equivalent technological leap forward for structural biology
The Structural Genomics Consortium