Lecture 5 Flashcards

Question 1

Q

Free energy and path sampling methods are two enhanced sampling techniques already discussed, briefly describe an additional one

Answer

A

Seeded molecular dynamics: put in nucleus of critical size to begin with and work around

Question 2

Q

What is the purpose of enhanced sampling in the context of ice nucleation?

Answer

A

Speed up our simulation so we can observe the actual nucleation event
Avoid tempering with natural system evolution to obtain true dynamics and mechanism
Attain the microscopic mechanism and kinetics to obtain a nucleation rate.

Question 3

Q

First step to enhanced sampling is to assign order parameters. What would be a sensible order parameter to describe nucleation?

Answer

A

Number of water molecules within the largest ice nucleus
The path(s) from A (liquid) to B (crystal) are then described in terms of this parameter where the P(B|A) is described with FFS

Question 4

Q

What are some difficulties that need to be accounted for in a system describing ice nucleation?

Answer

A

Many structural degrees of freedom
Density changing from liquid to solid
Nucleation may have multiple steps (multiple barriers)

Question 5

Q

The rate we attain from FFS of ice nucleation is 11 orders of magnitude off, however, what have we can we still learn from it?

Answer

A

Gain insight into the mechanism the algorithm follows, leading to topological forms that would otherwise not be speculated.

Question 6

Q

What is the process of breaking down a molecular structure in to descriptors we can feed in to our ML model?

Answer

A

Convert 3D structure to 2D
Decompose 2D structure in a way that can be made into an adjacency matrix
Diagonalize this matric to get eigenvalues that can be used as a principle eigenvalue in descriptors

Question 7

Q

What are cliques and why are they useful?

Answer

A

Cliques are the subunits that comprise all the molecules in our training set, allowing us to make a CG rep of each molecule of these clique components.

Question 8

Q

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
Each jth clique is represented by a … containing … elements i.e. in 100 cliques each clique is represented by a … with … elements
These N_clq … are all equal to … except one; the one element corresponding to the j-th clique.
A molecule is represented by the … of all its cliques.

Answer

A

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
Each jth clique is represented by a vector containing N_clq elements i.e. in 100 cliques each clique is represented by a vector with 100 elements
These N_clq elements are all equal to 0 except one; the one element corresponding to the j-th clique.
A molecule is represented by the sum of all its cliques.

Question 9

Q

What is a pro and con of using this coarse-grained representation

Answer

A

Pro: Highlights the importance of different functional groups, reducing noise
Con: sacrificing some detail, which scales poorly with the amount of data being used.

Question 10

Q

(IMP) Assign molecular descriptors for the following molecules

Question 11

Q

(IMP) Assign molecular descriptors for the following molecule

Answer

A

[5 2 1]; N.B. can combine cliques in any order e.g. [2 5 1]

Question 12

Q

(IMP)

Once … cannot be improved anymore, choice of … can be tuned to best suit …
ARD … uses a … of different … , one for each … in ensemble. This is useful as different … can have different … ….
Gives us an idea of which … matter most.

Answer

A

Once descriptor cannot be improved anymore, choice of kernel can be tuned to best suit descriptors
ARD kernel uses a combination of different kernels, one for each descriptor in ensemble. This is useful as different descriptors can have different length scales.
Gives us an idea of which descriptors matter most.

Lecture 5 Flashcards

-have an understandinf of the basics of: -Neural networks -Gaussian processes (12 cards)