Lecture 5 Flashcards

-have an understandinf of the basics of: -Neural networks -Gaussian processes

1
Q
  • Free energy and path sampling methods are two enhanced sampling techniques already discussed, briefly describe an additional one
A
  • Seeded molecular dynamics: put in nucleus of critical size to begin with and work around
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of enhanced sampling in the context of ice nucleation?

A
  • Speed up our simulation so we can observe the actual nucleation event
  • Avoid tempering with natural system evolution to obtain true dynamics and mechanism
  • Attain the microscopic mechanism and kinetics to obtain a nucleation rate.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

First step to enhanced sampling is to assign order parameters. What would be a sensible order parameter to describe nucleation?

A
  • Number of water molecules within the largest ice nucleus
  • The path(s) from A (liquid) to B (crystal) are then described in terms of this parameter where the P(B|A) is described with FFS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some difficulties that need to be accounted for in a system describing ice nucleation?

A
  • Many structural degrees of freedom
  • Density changing from liquid to solid
  • Nucleation may have multiple steps (multiple barriers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • The rate we attain from FFS of ice nucleation is 11 orders of magnitude off, however, what have we can we still learn from it?
A
  • Gain insight into the mechanism the algorithm follows, leading to topological forms that would otherwise not be speculated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • What is the process of breaking down a molecular structure in to descriptors we can feed in to our ML model?
A
  • Convert 3D structure to 2D
  • Decompose 2D structure in a way that can be made into an adjacency matrix
  • Diagonalize this matric to get eigenvalues that can be used as a principle eigenvalue in descriptors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • What are cliques and why are they useful?
A
  • Cliques are the subunits that comprise all the molecules in our training set, allowing us to make a CG rep of each molecule of these clique components.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
  • Each jth clique is represented by a containing elements i.e. in 100 cliques each clique is represented by a with elements
  • These Nclq are all equal to except one; the one element corresponding to the j-th clique.
  • A molecule is represented by the of all its cliques.
A
  • One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction
  • Each jth clique is represented by a vector containing Nclq elements i.e. in 100 cliques each clique is represented by a vector with 100 elements
  • These Nclq elements are all equal to 0 except one; the one element corresponding to the j-th clique.
  • A molecule is represented by the sum of all its cliques.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  • What is a pro and con of using this coarse-grained representation
A
  • Pro: Highlights the importance of different functional groups, reducing noise
  • Con: sacrificing some detail, which scales poorly with the amount of data being used.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(IMP) Assign molecular descriptors for the following molecules

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(IMP) Assign molecular descriptors for the following molecule

A
  • [5 2 1]; N.B. can combine cliques in any order e.g. [2 5 1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(IMP)

  • Once cannot be improved anymore, choice of can be tuned to best suit
  • ARD uses a of different , one for each in ensemble. This is useful as different can have different .
  • Gives us an idea of which matter most.
A
  • Once descriptor cannot be improved anymore, choice of kernel can be tuned to best suit descriptors
  • ARD kernel uses a combination of different kernels, one for each descriptor in ensemble. This is useful as different descriptors can have different length scales.
  • Gives us an idea of which descriptors matter most.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly