Foundations Flashcards
Even with Xavier and Kaming initialization, it can occur by chance that the weights of a neural network are initialized in such a way that the network is unable to learn anything useful.
True
If a pre-trained model is used and no new weights are added, we do not need Xavier and Kaming initialization at all.
True
It is satisfactory to have the mean and variance of the distribution of output values average out to zero and one, respectively, across multiple initializations. In individual cases, these values may deviate.
True
Which tensors can be added to each other?
Same shape, not caring about commas or spaces
All standard weight operations can be expressed as matrix multiplications. This makes neural network operations so efficient when executed on GPUs.
True
Gradient of bias
2⋅(out−target)⋅1/n
Gradient of weight
Gradient of bias x input
Gradient of input
Gradient of bias x weight
Elementwise arithmetics for tensors
- with tensors all basic operators (+, -, *, /, >, <, ==) are applied elementwise
- both tensors involved in the calculation need to have the same shape
- operation is executed for every pair of elements at the same positions in the two tensors and thus, the result is a tensor with same shape as input tensors
Kaming Initialization
When using a ReLU activation the scaling factor of √(2/n_input) preserves the standard deviation
Xavier Initialization
- suitable scaling factor (1/√n_input)
- Passing the values through the activation function may alter the mean and standard deviation again causing the values to vanish or overflow
What is calculated during backpropagation?
Gradients
What gradients show?
Gradients indicate how the network should adjust its parameters to minimize the loss, not directly the quality of the network.
What is forward pass?
process of passing input data through the layers of a neural network to produce an output (e.g., predictions or logits).
What is backpropagation?
process of computing gradients of the loss function with respect to the network’s parameters (weights and biases) using the chain rule of calculus