Activation Functions in FCNs Flashcards
What does backpropagation compute in a neural network?
It computes how the error changes with respect to each weight and updates them using gradient descent.
What is the role of gradient descent in training neural networks?
It minimizes the loss function by updating weights iteratively.
Why are activation functions necessary in neural networks?
They introduce non-linearity, enabling the network to model complex patterns beyond linear relationships.
What is the limitation of using a purely linear activation function?
It cannot learn complex patterns and reduces the network to a simple linear regression model.
What advantages do non-linear activation functions provide?
They enable deep networks by allowing backpropagation and stacking multiple layers.
What is the primary issue with the sigmoid function in deep learning?
It suffers from the vanishing gradient problem for very large or very small input values.
When is the softmax function used?
It is used in multi-class classification to convert logits into probability distributions.
How does the tanh function differ from sigmoid?
It maps inputs between -1 and 1, making it better at representing negative and positive relationships.
What problem does tanh suffer from?
It still suffers from the vanishing gradient problem for extreme input values.
What is the mathematical definition of ReLU?
f(x)=max(0,x), meaning it outputs zero for negative inputs and the input itself for positive values.
What issue does ReLU suffer from?
The “dying ReLU” problem, where neurons output zero for all negative inputs and stop learning.
How does Leaky ReLU address the dying ReLU problem?
It allows a small, non-zero gradient for negative inputs.
Why are activation functions necessary in neural networks?
They introduce non-linearity, enabling the network to learn complex patterns beyond linear transformations.
ReLU
Rectified Linear Unit, a non-linear activation function
Which activation function is best for hidden layers in deep networks?
ReLU is the most commonly used due to its efficiency in training.
Which activation function is preferred in the output layer for binary classification?
Sigmoid
Which activation function is used for multi-class classification?
Softmax
Why must activation functions be differentiable?
So that gradient descent can update the network’s weights using backpropagation.