There exist many computational methods that are available depending on the length/time scale we are interested in. Classes in order of increasing ... scale and decreasing ... scale they investigate Quantum/... ... : includes atoms, electrons in an explicit solvent using ... ... (Schrodinger). Accurate but computationally unfeasible for \>... atoms. ... -...: all/most atoms included in an explicit solvent but instead uses...as a method of reference, which describe atoms via.... Coarse-grained: group 4-5 heavy atoms into beads, in an explicit or ... solvent using MD. Far less interactions to compute but at the cost of accuracy. ... -...-... : interaction sites grouped, comprising of many atoms, generally proteins/peptides, using an implicit solvent with ... dynamics. ... : Materials represented as a continuous mass in an implicit solvent, using ... ... (instead of QM/EOM).

There exist many computational methods that are available depending on the length/time scale we are interested in. Classes in order of increasing length scale and decreasing time scale they investigate Quantum/Ab initio: includes atoms, electrons in an explicit solvent using quantum mechanics (Schrodinger). Accurate but computationally unfeasible for \>3 atoms. All-atom: all/most atoms included in an explicit solvent but instead uses MD as a method of reference, which describe atoms via EOMs. Coarse-grained: group 4-5 heavy atoms into beads, in an explicit or implicit solvent using MD. Far less interactions to compute but at the cost of accuracy. Supra-coarse-grained: interaction sites grouped, comprising of many atoms, generally proteins/peptides, using an implicit solvent with stochastic dynamics. Continuum: Materials represented as a continuous mass in an implicit solvent, using continuous dynamics (instead of QM/EOM).

Define rare events in the context of molecular simulations.

Processes that involve time scales much longer than what we can computationally afford. Definition relative to our method of choice.

(IMP) How is system free energy related to the timescale problem.

Our phase space if a collection of all configuration’s position and momenta. A rare event is likely to be separated from our starting configuration by a high free energy barrier that cannot be overcome using MD/MC alone in a feasible timescale. Instead we are likely to spend all/most of our simulation time trapped in that local minima.

(IMP) Briefly describe the two main methods of enhanced sampling.

Free energy-based methods: characterize PES of system E.g. umbrella sampling, replica exchange MD, metadynamics Choice of reaction coordinate/dof important and will affect free E of system. No direct info about kinetics of the process Path sampling-based methods: explore all possible pathways E.g. transition path/interface sampling, forward flux sampling More suited for processes with many possible pathways between No direct info about the thermodynamics of the system.

What is an order parameter?

An order parameter or collective variable, allows the different configurations/states of a system to be distinguished Allows system to be driven from A to B (or v.v) through application of enhanced sampling method of choice.

What are the requirements of an OP? what is the difficulty in this requirement?

Must be differentiable with respect to atomic position This is simple in the cases illustrated but can be very challenging for complex systems that is simple enough to compute but accurately represents dynamics. Can also be difficult when no prior knowledge of configurational space is known.

What is affected by our OP choice in free energy-based methods.

Our choice of OP determines the resulting free energy surface (FES) This multidimensional hyper surface is re-mapped on to a simple coarse-grained FES (1D/2D)

What is problem with this new FES?

Can often be an oversimplification as only representative of the chosen order parameter. The true FES, which represents all configurational space does not equal the FES we are sampling, constrained to our OP of choice

(IMP) ML methods are good at ... data, but poor are ... predictions. In other words, ML can give accurate prediction ... ... ... ... (filling the gaps), but poor at predicting data ... of it. The ... /... of data accessible and how we ... it is much more important than the algorithm we feed it in to.

(IMP) ML methods are good at interpolating data, but poor are extrapolating predictions. In other words, ML can give accurate prediction within a data set (filling the gaps), but poor at predicting data outside of it. The amount/quality of data accessible and how we describe it is much more important than the algorithm we feed it in to.

What are the pros and cons of ML-based interatomic potentials?

Pros: fast, long large scale simulations with quasi quantum chemistry accuracy (if data set is good Cons: Not easy to craft dataset, can take years to improve iteratively.

Key notes Flashcards by Tim Lafosse

There exist many computational methods that are available depending on the length/time scale we are interested in.
Classes in order of increasing … scale and decreasing … scale they investigate
Quantum/… … : includes atoms, electrons in an explicit solvent using … … (Schrodinger). Accurate but computationally unfeasible for >… atoms.
… -…: all/most atoms included in an explicit solvent but instead uses…as a method of reference, which describe atoms via….
Coarse-grained: group 4-5 heavy atoms into beads, in an explicit or … solvent using MD. Far less interactions to compute but at the cost of accuracy.
… -…-… : interaction sites grouped, comprising of many atoms, generally proteins/peptides, using an implicit solvent with … dynamics.
… : Materials represented as a continuous mass in an implicit solvent, using … … (instead of QM/EOM).

There exist many computational methods that are available depending on the length/time scale we are interested in.
Classes in order of increasing length scale and decreasing time scale they investigate
Quantum/Ab initio: includes atoms, electrons in an explicit solvent using quantum mechanics (Schrodinger). Accurate but computationally unfeasible for >3 atoms.
All-atom: all/most atoms included in an explicit solvent but instead uses MD as a method of reference, which describe atoms via EOMs.
Coarse-grained: group 4-5 heavy atoms into beads, in an explicit or implicit solvent using MD. Far less interactions to compute but at the cost of accuracy.
Supra-coarse-grained: interaction sites grouped, comprising of many atoms, generally proteins/peptides, using an implicit solvent with stochastic dynamics.
Continuum: Materials represented as a continuous mass in an implicit solvent, using continuous dynamics (instead of QM/EOM).

How well did you know this?

Not at all

Perfectly

Define rare events in the context of molecular simulations.

Processes that involve time scales much longer than what we can computationally afford.
Definition relative to our method of choice.

How well did you know this?

Not at all

Perfectly

What are enhanced sampling techniques

Computational methods that allow one to overcome the timescale problem by sampling the phase space of our system to a greater extent. We can use MD or MC to do this.

How well did you know this?

Not at all

Perfectly

(IMP) How is system free energy related to the timescale problem.

Our phase space if a collection of all configuration’s position and momenta.
A rare event is likely to be separated from our starting configuration by a high free energy barrier that cannot be overcome using MD/MC alone in a feasible timescale. Instead we are likely to spend all/most of our simulation time trapped in that local minima.

How well did you know this?

Not at all

Perfectly

(IMP) Briefly describe the two main methods of enhanced sampling.

Free energy-based methods: characterize PES of system
E.g. umbrella sampling, replica exchange MD, metadynamics
Choice of reaction coordinate/dof important and will affect free E of system.
No direct info about kinetics of the process
Path sampling-based methods: explore all possible pathways
E.g. transition path/interface sampling, forward flux sampling
More suited for processes with many possible pathways between
No direct info about the thermodynamics of the system.

How well did you know this?

Not at all

Perfectly

What is an order parameter?

An order parameter or collective variable, allows the different configurations/states of a system to be distinguished
Allows system to be driven from A to B (or v.v) through application of enhanced sampling method of choice.

How well did you know this?

Not at all

Perfectly

What are the requirements of an OP? what is the difficulty in this requirement?

Must be differentiable with respect to atomic position
This is simple in the cases illustrated but can be very challenging for complex systems that is simple enough to compute but accurately represents dynamics.
Can also be difficult when no prior knowledge of configurational space is known.

How well did you know this?

Not at all

Perfectly

What is affected by our OP choice in free energy-based methods.

Our choice of OP determines the resulting free energy surface (FES)
This multidimensional hyper surface is re-mapped on to a simple coarse-grained FES (1D/2D)

How well did you know this?

Not at all

Perfectly

What is problem with this new FES?

Can often be an oversimplification as only representative of the chosen order parameter.
The true FES, which represents all configurational space does not equal the FES we are sampling, constrained to our OP of choice

How well did you know this?

Not at all

Perfectly

(IMP) How does statistical mechanics relate to our FES assumption?

Stat mech confirms true FES ≠ our FES according to this OP, as uses configurational partition function, Z which does not contain info about kinetics/ dynamics of the system and depends only on particles position in the system only.

How well did you know this?

Not at all

Perfectly

(PPQ) How can we use commitor analysis to probe the accuracy of our OP?

Select a value, OP* of OP corresponding to a putative transition state (maximum)
Sample an ensemble of configurations characterized by OP*
Run several MD simulations for each, varying velocity (making them statistically independent)
Find probability of OP* configuration making to it to B
Plot histogram of these probabilities.

How well did you know this?

Not at all

Perfectly

(PPQ) Which of these histograms indicates a more accurate OP?

(a) indicates no matter where you start you always end up with same probability of ending up in A or B, NO indication this is a TS
(b) indicates probability of ending up in B increasing as you move further along coordinate, 0.5 at middle. Representative of the putative TS
- Therefore, b is better.

How well did you know this?

Not at all

Perfectly

(IMP)

… … … is a path based enhanced sampling method where instead of computing an overall (v.low) … , path is divided into a series of …
Each … has an increasing value of our …, … each corresponding to a possible … with said value of …
As λ increases likelihood of going from … to … , rather than back to A, …

(IMP)

Forward flux sampling is a path based enhanced sampling method where instead of computing an overall (v.low) probability, path is divided into a series of interfaces
Each interface has an increasing value of our OP, λ, each corresponding to a possible configuration with said value of λ
As λ increases likelihood of going from A to B, rather than back to A, increases

How well did you know this?

Not at all

Perfectly

(IMP) How are trajectories generating in FFS?

Begin by looking at fluctuations of a long unbiased MD run
At each interface λ_i large # of trial molecular dynamics runs done
The few that reach the next interface λ_i+1 are used as a starting point to reach the following interface.

How well did you know this?

Not at all

Perfectly

(IMP)

ML methods are good at … data, but poor are … predictions.
In other words, ML can give accurate prediction … … … … (filling the gaps), but poor at predicting data … of it.
The … /… of data accessible and how we … it is much more important than the algorithm we feed it in to.

(IMP)

ML methods are good at interpolating data, but poor are extrapolating predictions.
In other words, ML can give accurate prediction within a data set (filling the gaps), but poor at predicting data outside of it.
The amount/quality of data accessible and how we describe it is much more important than the algorithm we feed it in to.

How well did you know this?

Not at all

Perfectly

Why is there a need for ML-based interatomic potentials in simulations?

Classical forcefields lack the detail to accurately reproduce complex systems, as functional forms can be very limiting.
The timescales these systems exist in are also far too large for quantum calculations.

How well did you know this?

Not at all

Perfectly

What are the pros and cons of ML-based interatomic potentials?

Pros: fast, long large scale simulations with quasi quantum chemistry accuracy (if data set is good
Cons: Not easy to craft dataset, can take years to improve iteratively.

How well did you know this?

Not at all

Perfectly

What is the process in which ML is used in drug discovery

Study These Flashcards

Dataset of small molecules attained
Descriptors used to encode dataset of structures that can be processed by an algorithm.
ML algorithm used to optimise dataset
Desired property (e.g. solubility) predicted

What is a descriptor?

Study These Flashcards

A mathematical object (vector) that contains information about the system, encoded to be readily fed into a ML algorithm. Often, must satisfy specific properties like permutation invariance (same descriptor when exchanging identical atoms)
Is the most crucial step in ML drug design

Give an example of a simple and a complex descriptor

Study These Flashcards

Simple: molecular weight
Complex: Largest eigenvalue of adjacency matrix, where 1 is connected and 0 is not, giving a unique # corresponding to a unique matrix.

What is the main problem with ML?

Study These Flashcards

Understanding the interactions underpinning the PES of a system (ML potentials) or structure function relation between drug and potency (ML drug discovery) is very difficult as ML is a black box with many hidden layers between input and output.

(IMP) Label this neural network

Study These Flashcards

What are weights in a neural network and how do they aid the formation of further layers?

Study These Flashcards

Weights are numbers assigned to input nodes to form the first hidden layer.
The value of y of the third node in the first hidden layer is a linear combination of descriptors and all connected weights

How is data fed in to a neural network?

Study These Flashcards

Dataset generated (e.g. N molecules and their solubility values)
Descriptor values assigned to molecules (e.g. largest eigenvalue λⁱ_max of the adjacency matrix of the ith molecule)
Values represent input neurons/nodes forming the input layer

(PPQ) What is the role of the activation function in artificial NN’s, give an example of one to support your choice?

* Activation function are non-linear functions (e.g. the sigmoidal function) that transform the linear combinations into non-linear objects from one hidden layer to the next. * These linear combinations (e.g. relation between descriptors and solubility) are smoothed out, giving non-linear capabilities

Give the general equation describing the value a node in the second hidden layer in a NN

* Activation function only present in the hidden layers (=1 for input)

(IMP) How can we solve the issue of poor output values due to initial input weights?

* Use Backpropagation, where a cost function, f_cost is calculated. * This is the sum of output layer errors and target values (e.g. experimental solubility), written in terms of weights. * Its derivatives are used to improve initial guess of weights to assign to minimise f_cost on next iteration.

(IMP) What is the purpose of the hidden layers?

* Connect the results of our input nodes and weights and combines them further additional layers to fully optimise all input values. * More hidden layers, more nodes, more flexible functional form is, up to the point where overfitting begins

* An **...** represents each set of output values generated. * At the end of each **...** we compute the **...** **...** and use its derivatives to optimise the **...** * We stop this when our model is good enough that the **...** value of our molecule of choice generates an accurate enough output value.

* An **epoch** represents each set of output values generated. * At the end of each **epoch** we compute the **cost function** and use its derivatives to optimise the **weights.** * We stop this when our model is good enough that the **descriptor** value of our molecule of choice generates an accurate enough output value.

What are Gaussian processes?

Mathematical objects which can be used to fit data through regressions via the generalization to infinite dimensions of a normal Gaussian distribution.

(IMP) What are Kernel functions (K)?

* The **covariance matrix** of our GP defines the **shape of ensemble of Gaussians** in space. * A **functional form** for it must be written in terms of **hyper-parameters that can be optimised.** * This mathematical expression is required as the covariance is an arbitrary set of numbers in a matrix * For each element i,j of the covariance matrix can write an expression called a kernel, which is a function of the x_i, x_j descriptor point y_i, y_j in our dataset.

(IMP) Choosing the Kernel functional form can be very challenging. Give an example of a common choice

* The radial basis function (RBF) kernel * A measure of the similarity between the two descriptors (i.e. between two molecules), as it is a function of the distance between them

(IMP) * The hyper-parameter L is the quantity **...** according to the **...** **...** **...** to obtain our ML model using **...** **.** * It quantifies the **...** **...** by which two descriptors are close or not, giving the **...** of the resulting GP. * Like weights in NN’s

(IMP) * The hyper-parameter L is the quantity **optimised** according to the **log marginal likelihood** to obtain our ML model using **GP’s.** * It quantifies the **length scale** by which two descriptors are close or not, giving the **smoothness** of the resulting GP. * Like weights in NN’s

* (IMP) What is overfitting in ML

* After **many regressions** our **error** with respect to the cost function of our log marginal likelihood is very **small** giving numerically sounds results relative to our input data. * However, will reach a point where so much fitting is done that a resulting graph like L=0.3 forms which gives very poor predictions in practice.

(IMP) How can we solve the issue of overfitting?

* Split our data into training and test sets. * The training set (~80%) is used to build a model and the test set (~20%) is used to evaluate its predictive capabilities.

(IMP) What would overfitting indicate about training/test data

* If error in training set is very small and the error in its predictive capabilities is high, the data is overfit. * The opposite would indicate underfit data.

(IMP) Why is it easier to spot overfitting in NNs than GPs?

* NN split in to epochs, so when divergence in error between error in test and training data occurs, can move back to epoch before this. * Overfitting in GPs more difficult to detect/fix.

* Good numerical **...** of ML model with respect to the training set does NOT ensure the **...** of its **...** capabilities (IMP -sketch) * Even if going through all points **(****...**≈ 0),**...** in between these points may be large,

* Good numerical **accuracy** of ML model with respect to the training set does NOT ensure the **accurac**y of its **predictive** capabilities (IMP -sketch) * Even if going through all points **(f_cost** ≈ 0), **error** in between these points may be large,

* What is the process of breaking down a molecular structure in to descriptors we can feed in to our ML model?

* Convert 3D structure to 2D * Decompose 2D structure in a way that can be made into an adjacency matrix * Diagonalize this matric to get eigenvalues that can be used as a principle eigenvalue in descriptors

* What are cliques and why are they useful?

* Cliques are the subunits that comprise all the molecules in our training set, allowing us to make a CG rep of each molecule of these clique components.

* One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction * Each jth clique is represented by a **...** containing **...** elements i.e. in 100 cliques each clique is represented by a **...** with **...** elements * These N_clq **...** are all equal to **...** except one; the one element corresponding to the j-th clique. * A molecule is represented by the **...** of all its cliques.

* One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction * Each jth clique is represented by a **vector** containing **N_clq** elements i.e. in 100 cliques each clique is represented by a **vector** with **100** elements * These N_clq **elements** are all equal to **0** except one; the one element corresponding to the j-th clique. * A molecule is represented by the **sum** of all its cliques.

* What is a pro and con of using this coarse-grained representation

* Pro: Highlights the importance of different functional groups, reducing noise * Con: sacrificing some detail, which scales poorly with the amount of data being used.

(IMP) * Once **...** cannot be improved anymore, choice of **...** can be tuned to best suit **...** * ARD **...** uses a **...** of different **...** **,** one for each **...** in ensemble. This is useful as different **...** can have different **...** **...**. * Gives us an idea of which **...** matter most.

* Once **descriptor** cannot be improved anymore, choice of **kernel** can be tuned to best suit **descriptors** * ARD **kernel** uses a **combination** of different **kernels,** one for each **descriptor** in ensemble. This is useful as different **descriptors** can have different **length** **scales**. * Gives us an idea of which **descriptors** matter most.

Key notes Flashcards

(43 cards)