Lecture 5 Flashcards
The SCOP database
Structural Classification of Proteins Database developed in Cambridge
- Sorted proteins based on structural classes->folds->Superfamilies->families.
For example: Small proteins->Cysteine-knot cytokines->Cysteine-knot cytokines->Transforming growth factor beta - more of a functional than structural classification
a-helical
B-sheet
a+B - a-helices and B-sheets in different parts of proteins, no B-a-B motifs
a/B - Helices and sheets assembled from B-a-B motifs
a/B-linear - Line through centres of strands of sheet roughly linear
a/B-Barrels - Line through centres of strands of sheet roughly circular
Proteins with little/no secondary structure e.g. soft proteins
What does SCOP struggle with?
Domains
Many proteins are multi-domain and SCOP assumes they are single-domain
For example: clotting factors like factor XII:
Fn2-EGF-EGF-Fn1-Kr-SerPr
Factor IX:
Gla-EGF-EGF-SerPr
Difficult to classify evolutionary origins due to domain shuffling.
How do we define domains
The Gō plot method
Defines domains using distances of amino acids from centre of a protein and whether they cluster distantly to the centre of a protein.
Conducting Go plot:
1. Calculate the radius of spherical volume of protein
2. Calculate distance from each a-carbon of each amino acid to all the others
3. If distance is greater than spherical radius, score +.
Lines are drawn in a triangle to identify protein domains
Never get completely clean triangles in real proteins
What are the disadvantages of Gō plots
- Requires solved structures
- Domain boundaries not always clear
- Gō method now superseded by sequence based algorithms
Explain modular evolution of proteins
- Domain boundaries are usually at exon boundaries
- Not all exon boundaries are domain boundaries
- Genome rearrangements are important in evolution of new domain combinations
How does Pfam build domains?
- Start with high quality protein structure (X-ray crystallography, good resolution, low angstrom).
- BLAST PDB to find related protein structures.
- Align these - maximise structural homology (adjust alignment so secondary structural element boundaries match).
- Build a statistical profile (Hidden Markov Model - HMM) of ‘seed’ alignment.
Gaining a protein from PDB
Input seuqence and Protein sequence databank -> BLAST search
BLAST search -> Filter results (E<threshold) -> Multiple alignment sequence -> Position-specific scoring matrix
Hidden Markov modelling
Sequence
Showing patterns using sequence logos
Sizes of DNA/amino acids can be used to illustrate their presence across a series of different molecules.
What software shows DNA/amino acid patterns using sequence logos
Pfam on a larger scale
How does pfam build domains?
Use the HMM to query GenPept – hmmsearch
Align the new hits to the HMM – hmmalign
Rebuild the HMM to include the new hits – hmmbuild
Repeat as desired, or until there are no new hits
“Structure, structure, structure” (Alex Bateman, founder of Pfam)
Disadvantages of Pfam?
- Domains defined by a HMM, and HMM only as good as ‘seed’ alignment used to construct it.
- HMM building process is iterative, so errors can be magnified.
- Curation is uneven due to numbers of domains in Pfam
- Viruses under-represented
Homstrad
Homologous Structure Alignment Database
Used to collect good seed alignments
Used in construction of globin molecules
Not enough structures
- Structure determination is far harder than sequencing
- Illumina and Minion sequencing have made sequencing ultra high throughput
- No equivalent technological leap forward for structural biology
The structural genomics consortium