Molecular modelling Flashcards
Molecular modelling - definition
Use of a ………………………….. to study ………….., …………., …………….. and ……………. of molecules.
Use of a theoretical model to study structure, energy, dynamics and reactivity of molecules.
Molecular modelling - types (examples)
- …………………….. (………………..)
- …………………… (…………, ………….., …………….)
- …………………….. (……………………………………………).
- Visual/physical (stick model)
- Mathematical (physics, kinetics, reactivity)
- Computational (as mathematical but larger scale)
Molecular modelling - uses
- ……………………………………
- …………………………………
- ………………………………………………………….
- ………………….
- Structural determination
- Structure visualisation
- Calculating forces between molecules
- Experimental
Potential energy surface
U - energy associated with a configuration of a mechanical system which can be converted to work
Potential energy and molecule shape
Most likely shape of a molecule has …………………………………………., ie. exists at …………………… …………… of plot of ………………………………………………
Most likely shape of a molecule has the lowest potential energy, ie. exists at a global energy minimum of plot of potential energy vs geometry.
Potential energy functions - most detailed ⇒ least detailed
Required to calculate PES.
Ab initio QM, QM; semi-empirical QM; molecular mechanics/force fields; scoring functions.
Ab initio QM - basis set
Fitted functions correlated to orbitals for use in the Schrödinger equation.
Neglects e–e– interaction (assumes e– moves under average influence of all other e–)
Ab initio QM - Limitations
- Takes a long time
- Calculation complexity scales M4 for M AOs
- Supercomputers can manage systems ∼1-2k atoms
Ab initio QM - Advantages
- Few approximations/parameters
- Accuracy improves systematically
- Can be applied to reactions as well as structure
Ab initio QM - Methods
(most accurate ⇒ least)
With electron correlation: Configuration interaction; CASSCF; DFT
With no electron correlation: Unrestricted Hartree Fock; Hartree Fock
Semi empirical QM - basis set
Similar to AB but uses modified Hamiltonian operator, neglecting some interactions and replacing others with parameters.
Semi empirical QM - Limitations
- Accuracy cannot systematically improve
- Some cases may need correction (common for peptide bonds)
- Reactions can be studied but may lack accuracy
Semi empirical QM - Advantages
- Includes some electron correlation
- Much faster than AB
- Systems can be 2-3k atoms
- Using distributed processing systems (eg. fold@home), large structures >10k atoms like proteins can be modelled
Semi empirical QM - Methods
(most accurate ⇒ least)
- CNDO (1965)
- MNDO (1977)
- PM3 (1989)
- SAM1 (1993)
- PM6 (2007).
Molecular mechanics (“force field”) - basis set
Treats nuclei and electrons as atoms, connected by compressable springs.
Molecular mechanics (“force field”) - Bonded terms
- Applies up to 3 covalent bonds apart
- bond length (d), angle stretching (θ)
- Rotations about single bonds/dihedral angle (ϕ)
- Ubond (d) + Uangle (θ) + Udihedral (ϕ)
Molecular mechanics (“force field”) - Nonbonded interactions
- Operates over a longer range, rab
- Electrostatic energy (r-1) – uses Coulomb’s Law. Treats charges as fitted constants (qa/b)
- Uelec (rab) = atoms∑a,b qaqb / rab
- Van der Waals – close-range attraction (r-6) & repulsion (r12).
- Uvdw (rab) = atoms∑a,b (A / rab12 − C / rab6)
Molecular mechanics (“force field”) - Total potential energy
Utotal (d, θ, ϕ, rab) = Ubond (d) + Uangle (θ) + Udihedral (ϕ) + Uelec (rab) + Uvdw (rab)
Molecular mechanics (“force field”) - Approximations/parameters
- MM treats as atoms, not independent electrons/nuclei
- Inputs: d, θ, ϕ, rab
- Paramaters: A/C/q. Accuracy of calculation depends on accuracy of paramaters
Molecular mechanics (“force field”) - Limitations
- Needs new parameters for each molecule ID
- Cannot treat chemical reactions
Molecular mechanics (“force field”) - Advantages
- Can treat v large systems (up to 1m)
- Can be accurate if used correctly (ie. accurate paramaters)
Scoring functions - types
- First Principles
- Empirical
- Knowledge-based
Scoring: first principles
Physics based approach, considering molecular mechanics force field (VdW, electrostatic effects).
Utotal (rab) = atoms∑a,b qaqb / rab + atoms∑a,b (A / rab12 − C / rab6)
Scoring: first principles - Limitations
- Can be slow to calculate
- Depends on reliable paramaters
- Force fields provide potential energy, not free energy
Scoring: empirical
Computes binding free energy ΔG
C are constants, f are functions of distance ΔR & angle Δα between interacting groups on protein & ligand.
Scoring: empirical - Parametrising functions
- Training set: LR complexes with crystal structures and experimentally measured binding affinities ΔGbind
- Structures from training set are used to predict ΔGbind using a scoring function and constants C are modified until this matches experimental values
- Performance can be tested using other sets of known data (test sets)
Scoring: empirical - Advantages
- Fast
- Calculates free energy
Scoring: knowledge-based potentials
- Generates a free energy function ΔGbind for each X-Y pair of protein-ligand atom IDs X/Y
- Extracted from set of known protein-ligand X-ray structures from PDB
- Assumes a Boltzmann distribution of protein atoms around ligands
- Averaged over all structures in database, eg. N,O pairs can be expected to be distance r apart:
- ΔN-OGbind (rN-O) = - RT ln g(rN-O) where g = distribution/variance in database
- Total ΔGbind is sum of all pairs of ΔGbind energies
Scoring: knowledge-based - Limitations
- Boltzmann statistics for distribution of atom pairs not realistic
- Difficult to define reference state
Scoring: knowledge-based - Advantages
- Fast
- Calculates free energy
- More general than empirical approaches
Computational docking
- Predicts how ligand L binds to a given 3D structure of receptor R.
- Geometry of bound ligand = binding mode/docked pose.
Computational docking - conformational search methods
- Systematic/grid based
- Genetic algorithm
- Graph theory
- Extant knowledge can provide a starting point/bias predictions.
Computational docking - search process
- Characterise active site (where, shape etc)
- Conformational search of ligand in active site (= one generation)
- Score ligand poses using scoring functions
Computational docking - genetic algorithm
- Conformation data stored on “chromosome” (eg. side-chain dihedral angles, coordinates in active site)
- Initial population ranked using scoring function. Lowest energy conformations retained, highest removed.
- Conformations not removed share information by cross-over (AA/BB → AB/BA swap) or mutation (random data point change).
- New conformations scored.
(eg. pop = 7. 1,2 retained. 6,7 removed. 3-5 share information with 1,2 producing new 3-7. Pop = old 1,2, new 3-7.)
Conformations will get progressively lower in energy and so more accurate to experimental data.
Computational docking - testing docking
- RL complex with experimentally known 3D structure selected from database
- L removed from R and redocked
- Root-mean-square-distance determines if geometry reproduced:
rmsd = √ [(Natoms∑i=1 di2)/Natoms]
- where di = distance between equivalent atoms i between superimposed structures
- rmsd < 2Å acceptable for small ligands
Computational docking - limitations
- Proteins are flexible and often induce fit upon binding - cannot be accounted for
- Water in/around binding site may change entropy/enthalpy values