Week 2 (Neural Nets) Flashcards
What is a perceptron (and what activation function does it use)
What is a MLP (and how is the activation function different)
What things need to be defined for an MLP
How does the NN forward pass step work?
What is the NN backward pass step?
How is the error calculated for a neural network and how is this used to calculate the gradient
How is the gradient calculated for hidden layers (ie back propagation)
Answer this (neural network gradient):
What does a_j, z_j and h represent in neural networks
What is the typical delta_k for a neural network, ie the gradient on the last layer. Then generalise, what is the formula for delta_j
What is the backpropagation formula for delta_j
What is the general backgropagation playbook
How can back propagation be made more efficient
Store the previous gradients (ie gradients closer to the output layer) as they are reused on earlier layers.
How does gradient descent work
What are some methods to reduce overfitting of neural networks
Dropout, early stopping, regularisation
What is the formula for the sigmoid function
What is the vanishing and exploding gradient problem
What is gradient clipping and what is it used for
What are saturating and non-saturating activation functions
What are residual connections
How does early stopping work
Does parameter initialisation matter?
Yes
How does weight decay work (and what is the L2 regularisation term)
How does weight decay work
Where are bias values stored in neural networks
In the weight matrix (they multiply with a constant 1 neuron)