Neural Network Basics II Flashcards

Question 1

Q

Whats data image aumentation?

Answer

A

random resize and crop, rotation, stretching, color distortions,

Question 2

Q

Whats the softmax equation?

Answer

A

softmax(z)i = exp (zi)/sum(exp(zj)) for all j in M
zi = output

Question 3

Q

what are mini batches? Whats the recommended size?

Answer

A

set of items (“batch size”), e.g. 16 or 32. Its usually to the power of 2

Usually as many as will fit at once (incl. parameters and gradient) onto the GPU

Question 4

Q

What’s the stochastic gradient descent?

Answer

A

Gradient descent using mini-batches instead of the complete dataset

Question 5

Q

List practical techniques to improve model performance:

Answer

A

Using the learning rate finder
– Freezing and unfreezing layers
– Discriminative learning rates
– Choosing the right number of epochs
– Trying different model depths

Question 6

Q

Which of the following statements is true about data augmentation? (Multiple Choice)
1. Each image in the training dataset is transformed exactly once in order to reduce overfitting.
2. In each batch, the images are transformed slightly differently.
3. Data augmentation transformations only change the input and don’t change the label.
4. Presizing is a data augmentation technique that resizes all images to a resolution of 460 by 460 pixel.
5. For each data type, data augmentation techniques are different.

Question 7

Q

Which statements are true about mini-batches and SGD? (Multiple Choice)
1. Each mini-batch contains the same items from the dataset, which stabilizes the training process and thus balances the randomness of stochastic gradient descent.
2. The complete dataset is stored on the GPU and mini-batches are extracted from it to speed up the training process.
3. A mini-batch can be distinguished from the larger midi-batch and maxi-batch by the number of items per batch.
4. Given the size of the dataset, the batch size and the number of completed epochs, you can compute the number of steps performed during gradient descent.
5. Using mini-batches instead of individual items increases the reliability of the gradient estimation.

Question 8

Q

Which statements are true about modifications to the learning rate during the training process? (Multiple Choice)
1. When a network layer is “frozen”, you cannot perform predictions using it.
2. A learning rate schedule refers to learning rates that differ between network layers.
3. Freezing the pretrained layers at the beginning of a fine-tuning process reduces the risk of undoing the structures learned during the pretraining.
4. The learning rate finder extracts the history of learning rates used by a learner after it was trained.

Question 9

Q

What is the primary benefit of presizing in data augmentation?

A) Reduces the size of the dataset
B) Improves the quality and speeds up data augmentation transformations
C) Increases the number of epochs needed for training
D) Enhances the learning rate

Answer

A

Answer: B) Improves the quality and speeds up data augmentation transformations

Question 10

Q

What’s the formula for the Croos entropy Loss?

Answer

A

For 1:
-ln (softmax probability)
For 2+:
-(1/n)*sum of all ln(softmax probs)

Question 11

Q

Which of the following is a practical technique to determine the optimal learning rate for training a neural network?

A) Cross-validation
B) Learning rate finder
C) Early stopping
D) Regularization

Answer

A

Answer: B) Learning rate finder

Question 12

Q

Given a neural network output for a 4-class classification problem as
[2.5,1.0,0.5,−1.0], compute the class probabilities using the softmax function.

Answer

A

P1 ≈0.72,
P2≈0.16,
P3≈0.10,
P4 ≈0.02

Question 13

Q

Assume you have a mini-batch size of 64 and a dataset of 512 samples. How many mini-batches are needed to complete one epoch?

Question 14

Q

Write the Cross-Entropy Loss formula

Answer

A

CrossEntropy(p, q) = –(1/n) *ln(qj) for all j in n

Question 15

Q

Describe the process and benefits of freezing and unfreezing layers in transfer learning.

Answer

A

Freezing layers means keeping the weights of the initial layers of a pre-trained model unchanged while training the top layers on new data. This leverages pre-learned features for the new task. Unfreezing allows fine-tuning by adjusting weights of both initial and top layers, which can improve model performance on the new dataset. This approach speeds up training and often results in better accuracy with less overfitting.

Question 16

Q

During training, you observe that the training loss decreases while the validation loss increases. What is this indicative of, and how might you address it?

Answer

Study These Flashcards

A

This indicates overfitting, where the model is performing well on the training data but poorly on unseen validation data. To address overfitting, techniques such as regularization (L2 or dropout), data augmentation, reducing model complexity, and early stopping can be used.

Question 17

Q

How does using mini-batches in Stochastic Gradient Descent (SGD) impact the training process of neural networks?

Answer

Study These Flashcards

A

Answer:
Using mini-batches in SGD helps to reduce the variance of the parameter updates, leading to more stable and faster convergence compared to using a single example per update. It also makes better use of computational resources by allowing parallel processing of multiple samples, improving training efficiency and scalability.

Neural Network Basics II Flashcards

(17 cards)