9. Protein structure prediction Flashcards
The structure
is determined by the aa seq.
The folding will
correspond to the energy minima
Why protein modelling?
- Structure is important for function
- Gap between known sequences and structures is huge
Common structures
a-helix
b-sheet
b-turn
random coil
Program available for secondary structure prediction
DSSP
STRIDE
Different approaches
- Statistical methods
- Knowledge-based methods
- Machine learning
- Consensus method
Evaluation 2nd structure
Q3
Sov
Q3
fraction correctly predicted residues - Accuracy
Sov
Fractional overlap of segments - ability to pick up correct structure
How is Q3 and Sov used
They can only evaluate the method itself not your prediction as it looks at already known structures
- one should use both Q3 and Sov for good evaluation
Q3 equation
Correctly predicted residues/total residues =Q3%
Example of machine learning methods
PSIPRED
PHD
Jnet
Membrane topology
look if protein is bound to a membrane or not
Common characteristics of TM region (3)
- W or Y at the edges of the membrane
- ca 20 hydrophobic residues inside the membrane
- Positive inside
Database to identify TM regions
TOPCON
TOPCON
accounts different methods and make a consensus
- gives exact positions of TM region in the bottom
3D structure prediction
- More complex than secondary
- Two main approaches
Two main approaches of 3D modelling
- Homology modelling
- Ab initio modelling
Homology modelling
- use known structure as template
- higher seq similarity -> better prediction
- alignment and template selection is very important
Methodology (entire 3D)homology modelling (5steps)
- Identify related structure
- Align target seq to template structure
- Generate “known” backbone and side chains
- Generate loops
- Refine
Template selection
- select optimal crystal structure
Common approaches for template selection
sequence similarity
homology (both of these: BLAST, Prosite, Pfam)
Fold recognition
Fold recognition (threading)
- structure more conserved than sequence
- compare to a known library of folds (CATH, SCOP
- align sequence to a fold
- ENERGY CALCULATIONS
- does not require a similar sequence
Loops
- exposed regions are more variable than the protein core
- often important for protein function
- loops longer than 5 residues is hard to model
Approaches short, med, long
short - analytical approach
medium - database approach
long - fragment based approach
Optimisation of model
- use ENERGY MINIMISATION to fix bad parts
- side chain clashes
- bad peptide bond angles
Common errors (of 3D template) 5st
- side chain packing
- distortions/shifts
- bad loops
- misalignment
- bad template
Ab initio modelling
- predict structure without any prior knowledge but the sequence
- used when template-modelling fails
- works best for small proteins
- great computational cost
CASP
- evaluation of 3D structure methods
- similarity of model to the native structure
Function prediction
predict the function - use results from eg * signalP/TargetP * 2nd structure pred * topology predictions * 3D predictions - use machine learning methods to connect results usually given as gene ontology terms