Formulas Flashcards
What is the formula for mean squared error?
What is the sigmoid function?
What is the function for the hyperbolic tangent?
What are the two equations used to update the weights using Momentum?
What is the equation for the running average of the gradients used in Adam?
What is the equation for the squared gradients used in Adam?
How is each parameter updated when using Adam?
What is the Bayes Rule?
What is the formula for the entropy of a discrete probability distribution?
What is the formula for KL-divergence for two probability distributions?
What is the formula for the entropy of a continuous probability distribution?
What is the formula for the KL-divergence of a continuous probability distribution?
What is the entropy of a Gaussian Distribution?
What is the entropy of a d-dimensional Gaussian distribution?
What is the KL-divergence between two d-dimensional multivariate Gaussian Distributions?
What is the Wasserstein difference for two multivariate Gaussian Distributions?
What is the cross entropy error for a binary classification task?
What is the Gaussian Distribution equation?
What is the multivariate Gaussian Distribution equation?
For softmax, what is Prob(i)?
For softmax what is log Prob(i)
What is the equation for the gradient using softmax?
What is the weight decay equation?
What is the formula for the mean?
What is the formula for the variance?
What is the equation used for batch normalisation?
What is the loss function for Neural Style Transfer?
What is the true value (V*) of the current state?
What is the formula for Q*(s,a)?
What is the formula for the fitness of a policy?
What are we trying to minimise with Double Q-learning?
What is the formula for the Advantage function?
What is the equation for GANs?
What is the threshold activation function (context of this course)
Greater than the bias = 1, less than the bias = 0
In a deterministic environment, with a learning rate of 1, what is the Q-learning update rule?
Write the formula for activation Z of the node at location (j, k) in the ith filter of a Convolutional Neural Network which is connected by weights K to all nodes in a M x N window from the L filters (or channels) in the previous layer, assuming the bias weights are included in the activation function g().
What is the formula for the number of free parameters in a CNN layer?
F x (1 + L x M x N ), where F is the number of filters, M x N is the filter size, and L is the number of channels
What are the functions that describe a LSTM?
What is the Variational Auto Encoder trained to maximise?
For GANs, what is the formula for V(G, D)?
What is the formula for number of weights per filter?
(filter width) x (filter height) x (input depth) + 1 (for bias)
Number of neurons in this layer?
(output width) x (output height) x (depth)
Number of connections into the neuron’s in a layer?
(num neurons) x (connections per neurons / filter wights - bias)
Number of independent parameters?
(num filters) x (num weights)