Chapter 4: Fundamentals of ML Flashcards
Four broad categories of ML
- Supervised Learning
- Unsupervised Learning
- Self-Supervised Learning
- Reinforcement Learning
Why is validation data helpful?
helps you tune the hyperparamaters of the model. The number of layers, this size of layers etc.
information leak
every time you tune your model using the validation data some information about the validation data leaks into your model
Why don’t you evaluate ML models with the training data?
After a certain number of epochs (which you can’t predict ahead of time) the model will start to overfit to the training data
How do you evaluate a ML model?
By splitting off some data to evaluate it on called validation data
What is the goal when training machine learning models? models that do what?
Generalize well
When you evaluate machine learning models, what are you evaluating?
Their ability to generalize
How many sets of data should you use to train and evaluate a model?
- Train, Validate, Test
hyperparameters vs parameters
hyperparameters = number or size of layers in neural network parameters = the weights of each layer
Why does developing a neural network require the number of data sets it does?
Because training data sets the parameters of the model, and validation data tunes the hyperparameters
How are the hyperparameters of the model tuned?
using the performance of the model on the validation dat as a feedback signal
Classic model evaluation recipes
simple hold-out validation, K-fold val, iterated K-fold val with shuffling
Simple hold-out validation
set out test, validation and training
What might stop you from using simple hold-out validation?
If there’s too little data available. You can check this if different rounds of shuffling before splitting produce very different model performance
K-fold cross Validation
Split data into some K number of equal sets, then use all but 1 set to train the data and evaluate the data on the held-out partition. You take the average of the evaluation score as the model evaluation score. Models discarded after generating evaluation score
Iterated K-fold cross Validation with shuffling
Applying K-fold cross validation multiple times and shuffling the the data before each new split. The final score is the average of the scores from each run of the K-fold validation
How do you evaluate your model if you have very little data?
Iterated K-fold cross Validation with shuffling
How should you evaluate your model if it is exhibiting considerable variance in train test split?
K-fold cross validation
How can you help ensure Data Representativeness in your evaluation?
Using data shuffling before you split
Why can redundancy in your data be a problem for validation?
Because if there are repeats and then get split into test and training data, your model is then partially trained on your test data
What are some examples of data preprocessing?
Vectorization, Normalization, handling missing values, feature extraction
data vectorization
all inputs and targets in a neural network must be tensors of floating-point data
value normalization
have to make sure each feature is normalized so that its mean is 0 and its standard deviation is 1 is a common method but isn’t always strictly necessary
feature engineering
process by which a human makes a problem easier for a neural network using their deep understanding to explress the problem in a simpler way
Why can good feature engineering still be helpful for deep learning models?
lets you solve the problem with far less data
What is the fundamental tension in machine learning?
The tension between optimization and generalization
Underfit model
When there is still progress to be made in generalization of the model via optimization on the training data
Best way to prevent a model from learning irrelevant or misleading patterns in the training data?
Get more training data
How can you fight overfitting with limited data?
put constraints on what information the model is able to store or how much it is able to store
Regularization
The process of fighting overfitting by putting constraints on the information a model learns
A model’s capacity
The number of learnable parameters in a model
What is the simplest way to prevent overfitting via regularization?
by reducing the size of the model: the number of learnable parameters in the model
Why does reducing the memorization capacity of a network help prevent overfitting?
It won’t be able to learn mapping as easily so to minimize loss it will have to resort to compressed representation that maximize the predictive power
How does model capacity affect its losses?
Bigger networks will minimize their training loss much faster as they fit to the training data but their validation loss will also be much bigger as they start overfitting much earlier
Weight regularization
adding a cost to the loss function of a network associated with having large weights
L1 regularization
the cost added to the loss function by large weight values is proportional to the absolute value of the weight coefficients (the L1 norm of the weights)
L2 regularization
the cost added is proportional to the square of the value of the weight coefficients (the L2 norm of the weights). Also called weight decay
Dropout
One of the most common and effective regularization techniques. randomly dropping some a number of output features in a layer
Success metric for balanced-classification problems?
Accuracy and area under the receiving operating characterister curve (ROC AUC)
Success metric for imbalanced-classification problems?
Precision, Recall
Success metric for ranking problems and multilabel classification?
Mean average precision
How should data be formatted for a neural network
- Data in tensors
- Data should be scaled to small values
- If Data is heterogeneous should be normalized
- Often want to do some feature engineering
Statistical Power
A model that is capable of beating a dumb baseline
requirements for a loss function?
- needs to be computable given only a mini-batch of data
2. function needs to be differentiable (so you can use backpropagation to train your model)
common last-layer activations
Softmax, Sigmoid, none (for regression)
common loss functions
Binary crossentropy, categorical crossentropy, MSE
How do you know when you’ve reaced overfitting?
When the model’s performance on the validation score starts to degrade
Steps in building neural network
- Define problem; assemble dataset
- Choose measure of success
- Deciding on an evaluation protocol
- preparing your data
- Last-layer act, loss function, optimization config
- Scale up to overfitting
- Regularize model; Tune hyperparameters
final step before testing on test data?
Evaluate your validation procedure by combing training and validation data. If score is significantly worse than on the validation data, your validation process wasn’t reliable or your began to overfit to your validation data