ab initio structure prediction Flashcards
1
Q
ab initio methods
A
- template-free
- no template available or can’t be found
- 3 methods:
- all atom molecular dynamics
- simulate structure as it folds
- fragment approach
- contact prediction from multiple sequences
- all atom molecular dynamics
2
Q
rosetta
A
- fragment approach
- match query sequence to small sections of proteins
- 9 residue segments
- template based search algorithm
- fit sections together into 3D structure
- build up overlapping fragments with predicted structure
- create trial model
- if structure doesn’t work remodel it
3
Q
rosetta
identification of good trial structures
A
- initially low resolution energy function
- side chains represented by single centroid pseudoatom
- major contributions:
- hydrophobic burial, beta strand pairing, steric overlap, specific residue interactions
- form a coarse structure
- refine with rotamer based side chain representations
4
Q
rosetta
model choosing
A
- many possible methods created in ab initio
- can choose most popular:
- calculate distance between each model
- correct one has largest number of similar structures with smaller distances
- some aspects usually correct but large regions can be region
5
Q
contact prediction
A
- create contact map of inter-residue distances
- x or 1 where 2 atoms closer than set cutoff distance
- better if using MSA to find common pattern of complementary changes
- many sequences needed for strong signal
- aligning thousands of seqeucnes helps remove noise
6
Q
contact prediction
anti-parallel beta strand
A
- chains connected giving leading diagonal
- if e.g. residue 1 and 12 are close there is an inidcation of 3D space
- produces off diagonal (feature of 3D space)
- use these terms to build up structure
7
Q
contact prediction
limitations
A
- limited to proteins with large numbers of homologues
- >5000 ideally
- often means template is available anyway
- can still be useful for solving parts of a structure
- should improve as databases grow
- good for membrane proteins (difficult to crystallise)
8
Q
future of structure prediction
A
- increasing number of determined structures
- more templates
- more fragments for ab initio
- increasing number of sequences
- better MSAs and profiles
- better information capture for template based modelling
- increasing viability of contact prediction
9
Q
empirical prediction algorithms
A
- establish training dataset
- e.g. sequences with known structures
- ensure no duplicated due to homology
- non-redundant set, prevents bias
- learn rules and parameters
- evaluate on testing set not used in training
- no homology important
10
Q
jack-knifing
A
- cross-validation
- split database into training and testing sets
- start with set of non-homologous data
- take out one/several to form testing set
- learn on rest and evaluate test data
- repeat with different test data
- get mean and variation of accuracy
- statistical analysis to compare methods (t test)
11
Q
pitfalls of jack-knifing
A
- multiple methods available
- with proteins, testing and training data often have homology
- more difficult to detect bias as algorithms become more complex
- best to test on new data not available during development