Lecture 4 Flashcards

Question 1

Q

What are two common types of ML algorithms and how do they differ, briefly?

Answer

A

Neural networks and Gaussian processes
NN’s more suited to specific applications, however, can be equivalent under certain conditions

Question 2

Q

(IMP) Label this neural network

Question 3

Q

Question 4

Q

How is data fed in to a neural network?

Answer

A

Dataset generated (e.g. N molecules and their solubility values)
Descriptor values assigned to molecules (e.g. largest eigenvalue λⁱ_max of the adjacency matrix of the ith molecule)
Values represent input neurons/nodes forming the input layer

Question 5

Q

What are weights in a neural network and how do they aid the formation of further layers?

Answer

A

Weights are numbers assigned to input nodes to form the first hidden layer.
The value of y of the third node in the first hidden layer is a linear combination of descriptors and all connected weights

Question 6

Q

(IMP) What is the purpose of the hidden layers?

Answer

A

Connect the results of our input nodes and weights and combines them further additional layers to fully optimise all input values.
More hidden layers, more nodes, more flexible functional form is, up to the point where overfitting begins

Question 7

Q

(PPQ) What is the role of the activation function in artificial NN’s, give an example of one to support your choice?

Answer

A

Activation function are non-linear functions (e.g. the sigmoidal function) that transform the linear combinations into non-linear objects from one hidden layer to the next.
These linear combinations (e.g. relation between descriptors and solubility) are smoothed out, giving non-linear capabilities

Question 8

Q

Give the general equation describing the value a node in the second hidden layer in a NN

Answer

A

Activation function only present in the hidden layers (=1 for input)

Question 9

Q

Our output values (e.g. molecular solubility) from our neural network are very poor initially, why is this?

Answer

A

The weights are chosen randomly, and all further propagation in our network depends on the linear combination of these weights with the input.

Question 10

Q

(IMP) How can we solve the issue of poor output values due to initial input weights?

Answer

A

Use Backpropagation, where a cost function, f_cost is calculated.
This is the sum of output layer errors and target values (e.g. experimental solubility), written in terms of weights.
Its derivatives are used to improve initial guess of weights to assign to minimise f_cost on next iteration.

Question 11

Q

An … represents each set of output values generated.
At the end of each … we compute the … … and use its derivatives to optimise the …
We stop this when our model is good enough that the … value of our molecule of choice generates an accurate enough output value.

Answer

A

An epoch represents each set of output values generated.
At the end of each epoch we compute the cost function and use its derivatives to optimise the weights.
We stop this when our model is good enough that the descriptor value of our molecule of choice generates an accurate enough output value.

Question 12

Q

What are Gaussian processes?

Answer

A

Mathematical objects which can be used to fit data through regressions via the generalization to infinite dimensions of a normal Gaussian distribution.

Question 13

Q

Describe the features of 2D (bi-variate) normal distribution

Answer

A

Covariance tells us how similar the two dimensions are with respect to one another
The mean tells us the average point within the distribution

Question 14

Q

How can Gaussian processes be improved as we did in NN’s weights?

Answer

A

Bayesian inference improves a prior GP distribution guess according to the info provide in the dataset (comditioning)
Similar to assigning weights in NN, where our model is also dependent on some parameters e.g. elements of covariance matrix

Question 15

Q

(IMP) What are Kernel functions (K)?

Answer

A

The covariance matrix of our GP defines the shape of ensemble of Gaussians in space.
A functional form for it must be written in terms of hyper-parameters that can be optimised.
This mathematical expression is required as the covariance is an arbitrary set of numbers in a matrix
For each element i,j of the covariance matrix can write an expression called a kernel, which is a function of the x_i, x_j descriptor point y_i, y_j in our dataset.

Question 16

Q

(IMP) Choosing the Kernel functional form can be very challenging. Give an example of a common choice

Answer

Study These Flashcards

A

The radial basis function (RBF) kernel
A measure of the similarity between the two descriptors (i.e. between two molecules), as it is a function of the distance between them

Question 17

Q

(IMP)

The hyper-parameter L is the quantity … according to the … … … to obtain our ML model using … .
It quantifies the … … by which two descriptors are close or not, giving the … of the resulting GP.
Like weights in NN’s

Answer

Study These Flashcards

A

(IMP)

The hyper-parameter L is the quantity optimised according to the log marginal likelihood to obtain our ML model using GP’s.
It quantifies the length scale by which two descriptors are close or not, giving the smoothness of the resulting GP.
Like weights in NN’s

Question 18

Q

Discuss the resulting differences in the hyper parameters used for theses GP’s

Answer

Study These Flashcards

A

L=0.3
Model guesses points perfectly, however use of too many gaussians results in large errors in an attempt to be too precise.
L=1
Goes through all points perfectly, and does not overfit
L=3
Curve too smooth, leading to poor averaging as a result of underfitting

Question 19

Q

(IMP) What is overfitting in ML

Answer

Study These Flashcards

A

After many regressions our error with respect to the cost function of our log marginal likelihood is very small giving numerically sounds results relative to our input data.
However, will reach a point where so much fitting is done that a resulting graph like L=0.3 forms which gives very poor predictions in practice.

Question 20

Q

(IMP) How can we solve the issue of overfitting?

Answer

Study These Flashcards

A

Split our data into training and test sets.
The training set (~80%) is used to build a model and the test set (~20%) is used to evaluate its predictive capabilities.

Question 21

Q

(IMP) What would overfitting indicate about training/test data

Answer

Study These Flashcards

A

If error in training set is very small and the error in its predictive capabilities is high, the data is overfit.
The opposite would indicate underfit data.

Question 22

Q

(IMP) Why is it easier to spot overfitting in NNs than GPs?

Answer

Study These Flashcards

A

NN split in to epochs, so when divergence in error between error in test and training data occurs, can move back to epoch before this.
Overfitting in GPs more difficult to detect/fix.

Question 23

Q

Good numerical … of ML model with respect to the training set does NOT ensure the … of its … capabilities (IMP -sketch)
Even if going through all points (…≈ 0),… in between these points may be large,

Answer

Study These Flashcards

A

Good numerical accuracy of ML model with respect to the training set does NOT ensure the accuracy of its predictive capabilities (IMP -sketch)
Even if going through all points (f_cost ≈ 0), error in between these points may be large,

Lecture 4 Flashcards

-have an understandinf of the basics of: -Neural networks -Gaussian processes (23 cards)