Neural Network Basics II Flashcards
Whats data image aumentation?
random resize and crop, rotation, stretching, color distortions,
Whats the softmax equation?
softmax(z)i = exp (zi)/sum(exp(zj)) for all j in M
zi = output
what are mini batches? Whats the recommended size?
set of items (“batch size”), e.g. 16 or 32. Its usually to the power of 2
Usually as many as will fit at once (incl. parameters and gradient) onto the GPU
What’s the stochastic gradient descent?
Gradient descent using mini-batches instead of the complete dataset
List practical techniques to improve model performance:
- Using the learning rate finder
– Freezing and unfreezing layers
– Discriminative learning rates
– Choosing the right number of epochs
– Trying different model depths
Which of the following statements is true about data augmentation? (Multiple Choice)
1. Each image in the training dataset is transformed exactly once in order to reduce overfitting.
2. In each batch, the images are transformed slightly differently.
3. Data augmentation transformations only change the input and don’t change the label.
4. Presizing is a data augmentation technique that resizes all images to a resolution of 460 by 460 pixel.
5. For each data type, data augmentation techniques are different.
2,3,5
Which statements are true about mini-batches and SGD? (Multiple Choice)
1. Each mini-batch contains the same items from the dataset, which stabilizes the training process and thus balances the randomness of stochastic gradient descent.
2. The complete dataset is stored on the GPU and mini-batches are extracted from it to speed up the training process.
3. A mini-batch can be distinguished from the larger midi-batch and maxi-batch by the number of items per batch.
4. Given the size of the dataset, the batch size and the number of completed epochs, you can compute the number of steps performed during gradient descent.
5. Using mini-batches instead of individual items increases the reliability of the gradient estimation.
4,5
Which statements are true about modifications to the learning rate during the training process? (Multiple Choice)
1. When a network layer is “frozen”, you cannot perform predictions using it.
2. A learning rate schedule refers to learning rates that differ between network layers.
3. Freezing the pretrained layers at the beginning of a fine-tuning process reduces the risk of undoing the structures learned during the pretraining.
4. The learning rate finder extracts the history of learning rates used by a learner after it was trained.
3
What is the primary benefit of presizing in data augmentation?
A) Reduces the size of the dataset
B) Improves the quality and speeds up data augmentation transformations
C) Increases the number of epochs needed for training
D) Enhances the learning rate
Answer: B) Improves the quality and speeds up data augmentation transformations
What’s the formula for the Croos entropy Loss?
For 1:
-ln (softmax probability)
For 2+:
-(1/n)*sum of all ln(softmax probs)
Which of the following is a practical technique to determine the optimal learning rate for training a neural network?
A) Cross-validation
B) Learning rate finder
C) Early stopping
D) Regularization
Answer: B) Learning rate finder
Given a neural network output for a 4-class classification problem as
[2.5,1.0,0.5,−1.0], compute the class probabilities using the softmax function.
P1 ≈0.72,
P2≈0.16,
P3≈0.10,
P4 ≈0.02
Assume you have a mini-batch size of 64 and a dataset of 512 samples. How many mini-batches are needed to complete one epoch?
8
Write the Cross-Entropy Loss formula
CrossEntropy(p, q) = –(1/n) *ln(qj) for all j in n
Describe the process and benefits of freezing and unfreezing layers in transfer learning.
Freezing layers means keeping the weights of the initial layers of a pre-trained model unchanged while training the top layers on new data. This leverages pre-learned features for the new task. Unfreezing allows fine-tuning by adjusting weights of both initial and top layers, which can improve model performance on the new dataset. This approach speeds up training and often results in better accuracy with less overfitting.