Foundations Flashcards

1
Q

Even with Xavier and Kaming initialization, it can occur by chance that the weights of a neural network are initialized in such a way that the network is unable to learn anything useful.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If a pre-trained model is used and no new weights are added, we do not need Xavier and Kaming initialization at all.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is satisfactory to have the mean and variance of the distribution of output values average out to zero and one, respectively, across multiple initializations. In individual cases, these values may deviate.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which tensors can be added to each other?

A

Same shape, not caring about commas or spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

All standard weight operations can be expressed as matrix multiplications. This makes neural network operations so efficient when executed on GPUs.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gradient of bias

A

2⋅(out−target)⋅1/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gradient of weight

A

Gradient of bias x input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gradient of input

A

Gradient of bias x weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Elementwise arithmetics for tensors

A
  • with tensors all basic operators (+, -, *, /, >, <, ==) are applied elementwise
  • both tensors involved in the calculation need to have the same shape
  • operation is executed for every pair of elements at the same positions in the two tensors and thus, the result is a tensor with same shape as input tensors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kaming Initialization

A

When using a ReLU activation the scaling factor of √(2/n_input) preserves the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Xavier Initialization

A
  • suitable scaling factor (1/√n_input)
  • Passing the values through the activation function may alter the mean and standard deviation again causing the values to vanish or overflow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is calculated during backpropagation?

A

Gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What gradients show?

A

Gradients indicate how the network should adjust its parameters to minimize the loss, not directly the quality of the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is forward pass?

A

process of passing input data through the layers of a neural network to produce an output (e.g., predictions or logits).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is backpropagation?

A

process of computing gradients of the loss function with respect to the network’s parameters (weights and biases) using the chain rule of calculus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly