protein structure prediction Flashcards
1
Q
motivation for structure prediction
A
- inform about function
- guide raitonal drug design
- mutagenesis
- solve structures from experimental data
- fundamental understanding of chemistry of protein structure
2
Q
CASP
A
- critical assessment of protein structure prediction
- blind trial to evaluate different approaches
- sequences sent to predictors prior to revealing experimental coordinates
- manual evaluation every 2 years
- combined with server-only predictions
3
Q
ab initio energy calculations
A
- original idea to describe interactions between atoms
- search for conformaiton of lowest energy
- energy minimisation methods, followed by molecular dynamics
- from first principles
- energy function needed first
4
Q
energy function
A
- potential energy of a protein in a particular conformation
- V = bond length + bond angle + bond dihedral rotation + VDW + electrostatic interactions
- molecular dynamics adds water molecules
- energy minimsation adds ad hoc terms for hydrophobicity
- or works in vacuo
5
Q
energy minimisation
A
- x, y, z obtained for each atom
- calculate energy
- make small positional changes to find path to lowest energy conformation (deltaG is minimal)
- some success with small proteins
6
Q
issues with energy minimisation
A
- can get stuck in local minimum
- think it is the lowest point but there is a global minimum
- just can’t get there
- solve with molecular dynamics
- simulate protein as moving object
- has momentum to overcome energy barriers
- think it is the lowest point but there is a global minimum
- energy ladnscape is difficult to define
- unsure if you are going up or down
- energy terms are difficult to define - calculation can be wrong
7
Q
secondary structure prediction
A
- identify local structures
- alpha, beta, coil, sometimes turn
- 3 or 4 state prediction
- determines local 3D structure to an extent
- doesn’t work with 7 residue sequences
- same 7 residue sequence in different proteins can produce compeltely different structure
- algorithms look at window of ~15
- long range effects involved
8
Q
secondary structure prediction
accuracy measure
A
- no of residues correctly predicted/no of residues considered
- Q3 = accuracy measure of 3 state prediction
- random result with equal numbers of each state = 33%
- in a protein dominated by helices (80:20), best random prediction would say all helical = 80%
- typical mix of 3 states, random result = 40%
9
Q
old single sequence methods
A
- simpler
- used to derive newer methods
- mainly based on obtaining rules from counting frequencies of residues in known structures
- empirical
- e.g. chou fasman
10
Q
chou-fasman
A
- numerical residue scores derived from data and ad hoc rules
- based on secondary structure propensity
- score>1 implies residue occurs in helix morefrequently than by chance
- create matrix for alpha and beta propensities of all amino acids
- pro/gly = helix breakers
- some residues are similar
- can be greater than 1 (not probability)
11
Q
helix breakers
A
- helices need H bonds between NH and CO
- pro:
- side chains bends back to covalently bind NH
- no H bond
- side chains bends back to covalently bind NH
- gly:
- small residue
- makes a cavity
- packs poorly against the rest of the helix
12
Q
rules of chou-fasman
A
- helix if:
- run of 4 out of 6 residues favouring a helix
- average helix propensity > 1 and > average beta strand propensity
- extend helix until pro is found, or run of 4 residues with helix propensity <1
13
Q
stereochemical methods
A
- recognise patterns of hydrophobic residues that favour secondary structures
- empirical
- enhanced by inspection of structures
- no longer used but pattern concept still important
- difficult to program
- original Q3 ~ 60%
- improved by ML and neural networks
14
Q
stereochemical methods
alpha vs beta
A
- alpha:
- 3.6 res per turn
- amphipathic pattern consistent with helix
- helical wheel plot
- one side hydrophobic, other hydrophilic
- beta:
- can be buried
- sheet with helices either side, run of hydrophobic residues
- can be surface
- stacked pair of beta sheets (Ig fold)
- bottom sheet alternates
- can be buried
15
Q
artifical neural networks
A
- simulates computation of brains
- input signal and set of nodes
- weight nodes so that input signal gives an output signal of alpha/beta
- input and answer known - only need to find weights
- once weights know, new sequence output can be produced
- improved with MSAs
16
Q
use of MSAs
A
- average Q3 = 80%
- nearly all alpha identified, most beta
- short edge beta strands poorly predicted
- errors in defining precise ends
- programs:
- psipred
- jnet neural nets
17
Q
homology modelling
A
- infer structural similarity from sequence homology
- search query sequence against sequence library of known structures
- comparative/template based modelling
- most accurate structure prediction method
18
Q
process of homology modelling
A
- match query to database sequences
- use best match to predict fold
- >20% identity in psi-blast
- can use multiple sequences/structures
- helps gap placement
- fit sequence and align main chain
- add and adjust side chains to create predicted 3D structure
19
Q
loop regions in homology modelling
A
- fix backbone coordinates immediately in structurally conserved regions
- model variable regions in one of 2 ways:
- database search of PDB
- find short region of best guess of loop and transplant
- energy minimisation to find geometrically viable methods for 5 residue regions
- database search of PDB
- <6 residues generally well predicted
- long loops poorly predicted
20
Q
homology modelling
side chain packing
A
- specify dihedral chi angles of side chains
- build up library of allowed angles
- series of rotamers
- each has position and likelihood
- use algorithm to decide which rotaemrs fit well together
- remove clashes to find best combination
21
Q
homology modelling
refinement
A
- improve predicted structure with energy minimisation
- remove bumps
- molecular mechanics to calculate energy
- CASP:
- only a few gorups managed to improve models
- often makes it worse
- bumps drag things a long way out of position
- better off to leave them sometimes
22
Q
fold recognition
A
- threading
- enhanced version of remote homolog recognition, beyond PSI-BLAST and template-based modelling
- 3D information of library as well as searching the sequence
- recongising fold as well as sequence
- match sequence vs sequence with HMM as well as structure vs structure relationships
- HHSEARCH
- HHPRED
23
Q
Phyre2
A
- algorithm for fold recognition
- single domain:
- HHblits gives MSA for query
- PSIPRED predicts secondary structure
- HMM for sequence and structure
- carry over fixed regions from alignment
- loop modelling
- add side chains
- multiple domains need to be combined
- add together with ab initio methods
- final mdoel with linked domains