7. Model Building Flashcards

Question

What are the characteristics of different batch sizes?

Answer 1

Larger batch sizes take less time to train but are less accurate. If batch size is too small, training will bounce around; if it's too large, training will take a very long time. Large batch size can lead to out of memory error while training neural networks.

Answer 2

If the learning rate is too small, training will take ages; if it's too large, training will bounce around and ultimately diverge.

Answer 3

Transfer learning is an optimization to save time or get better performance. You can use an available pretrained model, which can be used as a starting point for training your own model. Transfer learning can enable you to develop models even for problems where you may not have very much data.

Answer 4

When you don't have enough labeled data to produce an accurate model and you don't have the resources to get more data, you can use semi‐supervised techniques to increase the size of your training data.

Answer 5

If the portion of labeled data isn't representative of the entire distribution, the approach may fall short.

Answer 6

To get more data to train the neural networks, you need to make minor alterations to your existing dataset such as flips or translations or rotations.

Answer 7

Offline: All augmentations are done before training Online: augmentations are done on-the-fly and it is preferred for large datasets.

Answer 8

Bias is the difference between the average prediction of our model and the correct value we are trying to predict. It is actually the error rate of the training data.

Answer 9

The error rate of the testing data is called variance. A model with high variance pays a lot of attention to training data and does not generalize on the data it hasn't seen before.

Answer 10

You need to find the right balance without overfitting or underfitting the data. If your model is too simple and has very few parameters, then it may have high bias and low variance. If our model has a large number of parameters, then it's going to have high variance and low bias.

Answer 11

An underfit model fails to sufficiently learn the problem and performs poorly on a training dataset and does not perform well on a test or validation dataset. An underfit model has high bias and low variance.

Answer 12

Data used for training is not cleaned. The model has a high bias.

Answer 13

Increase model complexity. Increase the number of features by performing feature engineering. Remove noise from the data. Increase the number of epochs or increase the duration of training to get better results.

Answer 14

The model learns the training data too well and performance varies widely with new unseen examples or even statistical noise added to examples in the training dataset. An overfit model has low bias and high variance.

Answer 15

Reduce overfitting by training the network on more examples. Reduce overfitting by changing the complexity of network structure and parameters.

Answer 16

Regularization technique Dropout: Probabilistically remove inputs during training. Noise: Add statistical noise to inputs during training. Early stopping: Monitor model performance on a validation set and stop training when performance degrades. Data augmentation. Cross‐validation. Hints: Dolphins Never Eat Apple Cores.

Answer 17

It tunes the loss function by adding a penalty term that prevents excessive fluctuation of the coefficients, thereby reducing the chances of overfitting.

Answer 18

L1 regularization shrinks the parameters toward 0. Penalizes the sum of absolute values of the weights. Built‐in feature selection Robust to outliers Reduce model size

Answer 19

L2 (ridge) forces weights to be small Penalizes the sum of squares of the weights Doesn't perform feature selection Not robust to outliers Improve generalization in linear models

Answer 20

Exploding gradients: Too large to converge. Use batch normalization and lower learning rate Dead ReLU units: Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. Lowering the learning rate can help keep ReLU units from dying. Vanishing gradients: The gradients for the lower layers (closer to the input) can become very small. When the gradients vanish toward 0 for the lower layers, these layers train very slowly or they do not train at all. The ReLU activation function can help prevent vanishing gradients. Dropout regularization to prevent overfitting: This type of regularization is useful for neural networks. It works by randomly “dropping out” unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization: 0.0 = No dropout regularization. 1.0 = Drop out everything. The model learns nothing. Values between 0.0 and 1.0 = More useful.

Answer 21

If the features don't add information relative to existing features, try a different feature. Decrease the learning rate. Increase the depth and width of your layers If you have lots of data, use held‐out test data. If you have little data, use cross‐validation or bootstrapping.

Answer 22

Features might not have predictive power. Raw data might not comply with the defined schema. Learning rate seems high, and you need to decrease it. Reduce your training set to few examples to obtain a very low loss. Start with one or two features (and a simple model) that you know have predictive power and see if the model overperforms your baseline.

7. Model Building Flashcards

(46 cards)