Lecture 5 Flashcards
-have an understandinf of the basics of: -Neural networks -Gaussian processes
1
Q
- Free energy and path sampling methods are two enhanced sampling techniques already discussed, briefly describe an additional one
A
- Seeded molecular dynamics: put in nucleus of critical size to begin with and work around
2
Q
What is the purpose of enhanced sampling in the context of ice nucleation?
A
- Speed up our simulation so we can observe the actual nucleation event
- Avoid tempering with natural system evolution to obtain true dynamics and mechanism
- Attain the microscopic mechanism and kinetics to obtain a nucleation rate.
3
Q
First step to enhanced sampling is to assign order parameters. What would be a sensible order parameter to describe nucleation?
A
- Number of water molecules within the largest ice nucleus
- The path(s) from A (liquid) to B (crystal) are then described in terms of this parameter where the P(B|A) is described with FFS
4
Q
What are some difficulties that need to be accounted for in a system describing ice nucleation?
A
- Many structural degrees of freedom
- Density changing from liquid to solid
- Nucleation may have multiple steps (multiple barriers)
5
Q
- The rate we attain from FFS of ice nucleation is 11 orders of magnitude off, however, what have we can we still learn from it?
A
- Gain insight into the mechanism the algorithm follows, leading to topological forms that would otherwise not be speculated.
6
Q
- What is the process of breaking down a molecular structure in to descriptors we can feed in to our ML model?
A
- Convert 3D structure to 2D
- Decompose 2D structure in a way that can be made into an adjacency matrix
- Diagonalize this matric to get eigenvalues that can be used as a principle eigenvalue in descriptors
7
Q
- What are cliques and why are they useful?
A
- Cliques are the subunits that comprise all the molecules in our training set, allowing us to make a CG rep of each molecule of these clique components.
8
Q
- One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
- Each jth clique is represented by a … containing … elements i.e. in 100 cliques each clique is represented by a … with … elements
- These Nclq … are all equal to … except one; the one element corresponding to the j-th clique.
- A molecule is represented by the … of all its cliques.
A
- One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
- Each jth clique is represented by a vector containing Nclq elements i.e. in 100 cliques each clique is represented by a vector with 100 elements
- These Nclq elements are all equal to 0 except one; the one element corresponding to the j-th clique.
- A molecule is represented by the sum of all its cliques.
9
Q
- What is a pro and con of using this coarse-grained representation
A
- Pro: Highlights the importance of different functional groups, reducing noise
- Con: sacrificing some detail, which scales poorly with the amount of data being used.
10
Q
(IMP) Assign molecular descriptors for the following molecules
A
11
Q
(IMP) Assign molecular descriptors for the following molecule
A
- [5 2 1]; N.B. can combine cliques in any order e.g. [2 5 1]
12
Q
(IMP)
- Once … cannot be improved anymore, choice of … can be tuned to best suit …
- ARD … uses a … of different … , one for each … in ensemble. This is useful as different … can have different … ….
- Gives us an idea of which … matter most.
A
- Once descriptor cannot be improved anymore, choice of kernel can be tuned to best suit descriptors
- ARD kernel uses a combination of different kernels, one for each descriptor in ensemble. This is useful as different descriptors can have different length scales.
- Gives us an idea of which descriptors matter most.