Lecture 6 Flashcards
When is a model not a deep learning one?
When the input unit is directly connected to the output, without any layers (of transformation) between them.
What are the four common activation functions?
- Sigmoid
- ReLU
- Softplus
- Tanh
What does the Sigmoid activation function do?
Ensures that the output is bound between 0 and 1.
What does the ReLU activation function do?
Enures the output is bound between 0 and max
, so there is no negative output.
What are the problems with the ReLU function?
Not continuous, and calculating derivative is challenging and slow.
What does the Softplus activation function do?
It is a smooth, continuous version of ReLU.
When negative values get close to zero, the function begins smoothly ascending.
What does the Tanh activation function do?
Ensures the output is bound between -1 and 1.
When is Tanh (activation function) useful?
When you want your network to be insensitive to numbers farther from zero, and more sensitive to numbers closer to zero.
Has a really good decision factor for when the model knows how close it is to 0.
Hidden layer
Any layer in a Neural Network wherein units are not directly connected to the outputs
Uses of Gradient Descent
- Learning the weights in computation graphs
- for calculating weights leading into units in the output layer
- for calculating weights leading into units in the hidden layers
What is the Loss function?
(y - ŷ)2
Vanishing Gradient
When the “error” signals are extinguished altogether as they are back-propogated through a deep network with many layers
How does a Vanishing Gradient occur?
- The gradient is very close to zero or exactly zero, so changing the weights leading into unit j has a negligible effect on its output.
- Floating point precision limitations of the hardware
How do you solve the Vanishing Gradient problem?
By renormalizing the vector of gradients as you propogate it back through the network, so you ‘boost’ the gradients.
What are the implications of a Vanishing Gradient?
The network will eventually freeze and no longer improve, even if there are more layers that need to have the loss back-propagated.