Neural Network Basics II Flashcards

1
Q

Whats data image aumentation?

A

random resize and crop, rotation, stretching, color distortions,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whats the softmax equation?

A

softmax(z)i = exp (zi)/sum(exp(zj)) for all j in M
zi = output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are mini batches? Whats the recommended size?

A

set of items (“batch size”), e.g. 16 or 32. Its usually to the power of 2

Usually as many as will fit at once (incl. parameters and gradient) onto the GPU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the stochastic gradient descent?

A

Gradient descent using mini-batches instead of the complete dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List practical techniques to improve model performance:

A
  • Using the learning rate finder
    – Freezing and unfreezing layers
    – Discriminative learning rates
    – Choosing the right number of epochs
    – Trying different model depths
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following statements is true about data augmentation? (Multiple Choice)
1. Each image in the training dataset is transformed exactly once in order to reduce overfitting.
2. In each batch, the images are transformed slightly differently.
3. Data augmentation transformations only change the input and don’t change the label.
4. Presizing is a data augmentation technique that resizes all images to a resolution of 460 by 460 pixel.
5. For each data type, data augmentation techniques are different.

A

2,3,5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which statements are true about mini-batches and SGD? (Multiple Choice)
1. Each mini-batch contains the same items from the dataset, which stabilizes the training process and thus balances the randomness of stochastic gradient descent.
2. The complete dataset is stored on the GPU and mini-batches are extracted from it to speed up the training process.
3. A mini-batch can be distinguished from the larger midi-batch and maxi-batch by the number of items per batch.
4. Given the size of the dataset, the batch size and the number of completed epochs, you can compute the number of steps performed during gradient descent.
5. Using mini-batches instead of individual items increases the reliability of the gradient estimation.

A

4,5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which statements are true about modifications to the learning rate during the training process? (Multiple Choice)
1. When a network layer is “frozen”, you cannot perform predictions using it.
2. A learning rate schedule refers to learning rates that differ between network layers.
3. Freezing the pretrained layers at the beginning of a fine-tuning process reduces the risk of undoing the structures learned during the pretraining.
4. The learning rate finder extracts the history of learning rates used by a learner after it was trained.

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the primary benefit of presizing in data augmentation?

A) Reduces the size of the dataset
B) Improves the quality and speeds up data augmentation transformations
C) Increases the number of epochs needed for training
D) Enhances the learning rate

A

Answer: B) Improves the quality and speeds up data augmentation transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the formula for the Croos entropy Loss?

A

For 1:
-ln (softmax probability)
For 2+:
-(1/n)*sum of all ln(softmax probs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is a practical technique to determine the optimal learning rate for training a neural network?

A) Cross-validation
B) Learning rate finder
C) Early stopping
D) Regularization

A

Answer: B) Learning rate finder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Given a neural network output for a 4-class classification problem as
[2.5,1.0,0.5,−1.0], compute the class probabilities using the softmax function.

A

P1 ≈0.72,
P2≈0.16,
P3≈0.10,
P4 ≈0.02

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assume you have a mini-batch size of 64 and a dataset of 512 samples. How many mini-batches are needed to complete one epoch?

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write the Cross-Entropy Loss formula

A

CrossEntropy(p, q) = –(1/n) *ln(qj) for all j in n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the process and benefits of freezing and unfreezing layers in transfer learning.

A

Freezing layers means keeping the weights of the initial layers of a pre-trained model unchanged while training the top layers on new data. This leverages pre-learned features for the new task. Unfreezing allows fine-tuning by adjusting weights of both initial and top layers, which can improve model performance on the new dataset. This approach speeds up training and often results in better accuracy with less overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

During training, you observe that the training loss decreases while the validation loss increases. What is this indicative of, and how might you address it?

A

This indicates overfitting, where the model is performing well on the training data but poorly on unseen validation data. To address overfitting, techniques such as regularization (L2 or dropout), data augmentation, reducing model complexity, and early stopping can be used.

16
Q

How does using mini-batches in Stochastic Gradient Descent (SGD) impact the training process of neural networks?

A

Answer:
Using mini-batches in SGD helps to reduce the variance of the parameter updates, leading to more stable and faster convergence compared to using a single example per update. It also makes better use of computational resources by allowing parallel processing of multiple samples, improving training efficiency and scalability.