Activation Functions Flashcards
What are the advantages of linear activation functions?
You can have multiple outputs rather than just 0 and 1 which are porportionate to the input.
Whats a disadvantage of linear activation functions?
You cannot use gradient descent as the gradient of a linear function is a constant irrespective of input. Also combinations of linear functions are also linear, meaning you essentially collapse networks into single nodes.
What are the advantages of the sigmoid function ?
1/(1 + e^-x)
- Unlike step sigmoid has smooth curve meaning probabilities of things being in a class can be determined as well.
- Outputs are normalised between zero and 1.
What are the disadvantages of the sigmoid function?
1/(1 + e^-x)
- Vanishing gradient problem, For large or small inputs the results will not change that much causing slow learning.
What are the advantages of tanh function?
(e^x - e^-x ) / (e^x + e^-x )
- Also S shaped allowing continuous results.
- range between 1 and -1
- Tanh’s gradient can reach a max of 1, whereas sigmoid can only reach 0.25
What are the disadvantages of the tanh function?
(e^x - e^-x ) / (e^x + e^-x )
- Vanishing gradient
- computationally expensive due to exponentials
How are tanh and sigmoid releated?
tanh(x) = 2 * sigmoid(x) - 1
what is the derivative of sigmoid?
sig(x)(1 - sig(x))
What is the derivative of tanh?
1- tanh(x)^2
What are the advantages of the relu function?
- Simple and easy to compute causing fast convergence
- No vanishing gradient problem
What are the disadvantages of the relu function?
- negative inputs always result in 0, this can cause the relu neuron to become inactive known as dying relu.
- Not a zero centred function meaning no derivative at zero exists.
What is leaky relu?
Like relu but makes negative inputs smaller rather than 0.
What are the advantages of Leaky Relu?
- easy to compute
- no vanishing gradient
- no dying relu
What are the disadvantages of Leaky Relu?
not zero centred.
What are the advantages of ELU?
- smoother than leaky relu at x=0
- no dying relu
- no vanishing gradient