Lecture 4 Flashcards
-have an understandinf of the basics of: -Neural networks -Gaussian processes
1
Q
- What are two common types of ML algorithms and how do they differ, briefly?
A
- Neural networks and Gaussian processes
- NN’s more suited to specific applications, however, can be equivalent under certain conditions
2
Q
(IMP) Label this neural network
A
3
Q
A
4
Q
- How is data fed in to a neural network?
A
- Dataset generated (e.g. N molecules and their solubility values)
- Descriptor values assigned to molecules (e.g. largest eigenvalue λimax of the adjacency matrix of the ith molecule)
- Values represent input neurons/nodes forming the input layer
5
Q
- What are weights in a neural network and how do they aid the formation of further layers?
A
- Weights are numbers assigned to input nodes to form the first hidden layer.
- The value of y of the third node in the first hidden layer is a linear combination of descriptors and all connected weights
6
Q
(IMP) What is the purpose of the hidden layers?
A
- Connect the results of our input nodes and weights and combines them further additional layers to fully optimise all input values.
- More hidden layers, more nodes, more flexible functional form is, up to the point where overfitting begins
7
Q
(PPQ) What is the role of the activation function in artificial NN’s, give an example of one to support your choice?
A
- Activation function are non-linear functions (e.g. the sigmoidal function) that transform the linear combinations into non-linear objects from one hidden layer to the next.
- These linear combinations (e.g. relation between descriptors and solubility) are smoothed out, giving non-linear capabilities
8
Q
Give the general equation describing the value a node in the second hidden layer in a NN
A
- Activation function only present in the hidden layers (=1 for input)
9
Q
Our output values (e.g. molecular solubility) from our neural network are very poor initially, why is this?
A
- The weights are chosen randomly, and all further propagation in our network depends on the linear combination of these weights with the input.
10
Q
(IMP) How can we solve the issue of poor output values due to initial input weights?
A
- Use Backpropagation, where a cost function, fcost is calculated.
- This is the sum of output layer errors and target values (e.g. experimental solubility), written in terms of weights.
- Its derivatives are used to improve initial guess of weights to assign to minimise fcost on next iteration.
11
Q
- An … represents each set of output values generated.
- At the end of each … we compute the … … and use its derivatives to optimise the …
- We stop this when our model is good enough that the … value of our molecule of choice generates an accurate enough output value.
A
- An epoch represents each set of output values generated.
- At the end of each epoch we compute the cost function and use its derivatives to optimise the weights.
- We stop this when our model is good enough that the descriptor value of our molecule of choice generates an accurate enough output value.
12
Q
What are Gaussian processes?
A
- Mathematical objects which can be used to fit data through regressions via the generalization to infinite dimensions of a normal Gaussian distribution.
13
Q
- Describe the features of 2D (bi-variate) normal distribution
A
- Covariance tells us how similar the two dimensions are with respect to one another
- The mean tells us the average point within the distribution
14
Q
- How can Gaussian processes be improved as we did in NN’s weights?
A
- Bayesian inference improves a prior GP distribution guess according to the info provide in the dataset (comditioning)
- Similar to assigning weights in NN, where our model is also dependent on some parameters e.g. elements of covariance matrix
15
Q
(IMP) What are Kernel functions (K)?
A
- The covariance matrix of our GP defines the shape of ensemble of Gaussians in space.
- A functional form for it must be written in terms of hyper-parameters that can be optimised.
- This mathematical expression is required as the covariance is an arbitrary set of numbers in a matrix
- For each element i,j of the covariance matrix can write an expression called a kernel, which is a function of the xi, xj descriptor point yi, yj in our dataset.