Foundations Flashcards

1
Q

What’s broadcasting?

A

It enables element-wise operations on arrays or tensors of different shapes efficiently and intuitively. It plays a critical role in simplifying the implementation of mathematical operations in high-dimensional data processing and deep learning frameworks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the mathematical formula of the first part of a neural network?

A

out = xi· wi + b

x = input
w = weight
b = bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What´s a scalar?

A

During broadcasting with a scalar the scalar can be thought of as being expanded to a tensor of the
same shape as the input tensor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two criteria that tells us if a tensor is broadcastable or not?

A
  • Two tensors of higher dimensions are broadcastable if the length of the axes of the lower ranked tensor
    match the length of the trailing axes in the higher ranked tensor
    – Two lengths of axes match in the sense of broadcasting if either one of them is 1 or they are equal

trailing axes: trailing axes = n last dimensions, depends on the size of the tensor you’re broadcasting with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the “expand_as” method do?

A

it virtually expands the tensor c to have the same shape as m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the method “unsqueeze()” do?

A

it adds an axes of length one at the defined position

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is it important to initialize the weights?

A

to prevent that In the case of an increasing standard deviation the values may overflow and in the case of too small weights the values may vanish eventually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the different types of initialization techniques?

A
  • Xavier: scaling factor (1/√n), n = # of inputs
  • Kaming: scaling factor (2/√n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What disadvantages does the Xavier initialization has over the Kaming?

A

Xavier method doesn’t preserve the std dev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following statements is true about weight initialization? (Multiple Choice)
1. The ReLU activation function preserves the distribution of values by leaving the majority of them unchanged (the linear part) and mapping values below zero to zero, which is acceptable given the desired mean of zero.
2. It is satisfactory to have the mean and variance of the distribution of output values average out to zero and one, respectively, across multiple initializations. In individual cases, these values may deviate.
3. If a pre-trained model is used and no new weights are added, we do not need Xavier and Kaming initialization at all.
4. In larger networks, the initialization process is relatively less critical due to the involvement of numerous random numbers. As a result, the likelihood of individual numbers impacting the overall outcome is mitigated.
5. Even with Xavier and Kaming initialization, it can occur by chance that the weights of a neural network are initialized in such a way that the network is unable to learn anything useful.

A

2,3,5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following statements is true about ANNs? (Multiple Choice)
1. All standard weight operations can be expressed as matrix multiplications. This makes neural network operations so efficient when executed on GPUs.
2. A single neuron cannot be implemented in plain Python, PyTorch or a similar deep learning library is required.
3. It is not possible to express the weights of a layer in a single matrix because the biases have to be separated from the input weights.
4. If one could obtain a fast enough GPU, while using only plain Python code, one could beat PyTorch’s CPU execution time for matrix multiplication.

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we know if two tensors are broadcastable?

A

All their dimensions (in the same positions) are compatible. If one of the two is not equal, but one of them is equal to 1, it can be broadcasted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is PyTorch generally faster than plain Python for deep learning tasks?

A) It has a more intuitive API
B) It uses functions implemented in C/C++
C) It has better visualization tools
D) It requires less memory

A

Answer: B) It uses functions implemented in C/C++

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following is a common issue that can occur with improper weight initialization in a neural network?

A) Faster convergence
B) Overfitting
C) Vanishing or exploding gradients
D) Reduced model complexity

A

Answer: C) Vanishing or exploding gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If a neural network has 3 layers with weights initialized using a normal distribution with a mean of 0 and a variance of 1, what is the expected variance of the output for each layer?

A

Answer:

The variance of the output remains 1 if the weights are initialized properly considering the input dimensions, assuming no activation or normalization layers that affect the output variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the concept of broadcasting in the context of neural network operations. Provide an example of how it can be used in matrix operations.

A

Answer:
Broadcasting is a technique that allows numpy or PyTorch to perform element-wise operations on arrays of different shapes by automatically expanding the smaller array to match the shape of the larger one. This is useful for efficient computation without the need for manual array resizing.

Example:
If A=np.array([[1,2],[3,4]]) and
B=np.array([1,2]), broadcasting allows you to perform C=A+B, resulting in C=np.array([[2,4],[4,6]]).

17
Q

Discuss the importance of proper weight initialization in deep learning models. What issues might arise from poor initialization?

A

Answer:
Proper weight initialization is crucial because it affects the convergence speed and stability of a deep learning model during training. Poor initialization can lead to problems such as vanishing or exploding gradients, where gradients either become too small to propagate back effectively or grow too large, causing numerical instability. Good initialization techniques, like Xavier or Kaming initialization, aim to maintain a stable variance of outputs and gradients throughout the network.

18
Q

You implemented a neural network but noticed that the gradients of the weights are either too large or too small during training. What could be the possible reasons, and how would you address this issue?

A

Answer:
The issue could be due to improper weight initialization, leading to vanishing or exploding gradients. To address this, one could use initialization methods like Xavier (Glorot) or Kaming initialization, which take into account the number of input and output units to maintain gradient flow. Additionally, gradient clipping can be used to prevent gradients from becoming too large.

19
Q

Explain the difference between the forward pass and the backward pass in a neural network. Why is the backward pass critical for model training?

A

Answer:
The forward pass in a neural network involves computing the output predictions from the input data by passing it through the network layers sequentially. The backward pass, also known as backpropagation, involves computing the gradients of the loss function with respect to the model parameters, enabling the model to learn by updating its weights. The backward pass is critical for training because it allows the model to minimize the loss function by adjusting the weights in the direction that reduces the error.