Structural bioinformatics Flashcards
Why do we need protein-protein docking?
PDB contains ~4,000 hetero-dimers (including duplicate copies for same complex)
Compare to number of entries ~170,000
Can one predict the structure of a complex starting with
unbound components?
Unbound components can be experimental or high quality predicted structures
Need to be able to model limited conformational change.
Describe typical protein protein docking
Walk through protein docking step by step
- Global search to find goo,d overlap of surfaces of protein
- Residue residue interactions - score with empirical residue park potential. You need to check if a given pair is present in the protein more frequently than just by chance
- Search for clusters of similar complex geometry with low energy. Many more ways to be reign than correct so correct solution will be found much more often than any individual wrong solution.
- Refinement - search for optimal combination of side-chain rotamers by energy calculation.
- Functional residue information - the function can give you info about the structure
Template based prediction from homologous structure of a complex
X-ray structure of protein A’ complexed with protein B’
A’ is homologue of A B’ is homologue of B
If A/B interface is favourable evaluated in 3D then predict A interacts with B
Template based modelling - sequence search
Start with sequence protein A and protein B
Based on sequence similarity, search library of complexes in PDB for a complex A’ / B’ where A is homologous to A and B is homologous to B’
Align sequence A to A’ and B to B’
Sequence search via BLAST, PSIBLAST or an advanced statistical model known as Hidden Markov Model HMM
Template based modelling - 2 model construction
On 3D structure of complex change sequence from A’ to A and B’ to B
Adjust any loops where there is an insertion or deletion Refine complex
Steps template-based modelling 3 – alternate model selection
Sometimes there can be several suitable templates as several have similar sequence identity
Construct several models
Score models (similar to ab initio docking)
Choose best model
NB this is one approach but several variations in template-based modelling
Coevolution and protein interactions
Concept of correlated mutations extend to homo and hetero complexes
AlphaFold multimer and Colab can consider complexes Active area of research – results very encouraging
What is the gene ontology?
A controlled vocabulary that can be applied to all organisms
Used to describe gene products - proteins and RNA - in any organism
All descriptions are supported by some level of evidence
How does GO work?
It captures information about 3 important features of function:
What does the gene product do?
Why does it perform these activities? Where does it act?
Describe the 3 gene ontologies
Molecular Function = biochemical function
the tasks performed by individual gene products; examples
are carbohydrate binding and GTPase activity
Biological Process = biological goal or objective (higher level function)
broad biological goals, such as mitosis or purine metabolism, that are accomplished by combinations of individual molecular functions.
Cellular Component = active location
subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme
Can you compare gene ontologies?
No, each branch is independent so you can’t really compare
Where do annotations come from?
Annotation source is important
It enables you to assess how confident the annotation is
GO associates annotations with an evidence code that indicates its source.
Uses of GO
Enhanced predictors of protein function return prediction of GO terms
Common features in a set of over-expressed / under- expressed genes can be reported as belonging to a common GO group
Why do you want to computionally predict protein function
About 240M sequences but time taken experimentally to determine function can be several years