Chapter 4: Fundamentals of ML Flashcards by James Anderson

Four broad categories of ML

Supervised Learning
Unsupervised Learning
Self-Supervised Learning
Reinforcement Learning

How well did you know this?

Not at all

Perfectly

Why is validation data helpful?

helps you tune the hyperparamaters of the model. The number of layers, this size of layers etc.

How well did you know this?

Not at all

Perfectly

information leak

every time you tune your model using the validation data some information about the validation data leaks into your model

How well did you know this?

Not at all

Perfectly

Why don’t you evaluate ML models with the training data?

After a certain number of epochs (which you can’t predict ahead of time) the model will start to overfit to the training data

How well did you know this?

Not at all

Perfectly

How do you evaluate a ML model?

By splitting off some data to evaluate it on called validation data

How well did you know this?

Not at all

Perfectly

What is the goal when training machine learning models? models that do what?

Generalize well

How well did you know this?

Not at all

Perfectly

When you evaluate machine learning models, what are you evaluating?

Their ability to generalize

How well did you know this?

Not at all

Perfectly

How many sets of data should you use to train and evaluate a model?

Train, Validate, Test

How well did you know this?

Not at all

Perfectly

hyperparameters vs parameters

hyperparameters = number or size of layers in neural network
parameters = the weights of each layer

How well did you know this?

Not at all

Perfectly

Why does developing a neural network require the number of data sets it does?

Because training data sets the parameters of the model, and validation data tunes the hyperparameters

How well did you know this?

Not at all

Perfectly

How are the hyperparameters of the model tuned?

using the performance of the model on the validation dat as a feedback signal

How well did you know this?

Not at all

Perfectly

Classic model evaluation recipes

simple hold-out validation, K-fold val, iterated K-fold val with shuffling

How well did you know this?

Not at all

Perfectly

Simple hold-out validation

set out test, validation and training

How well did you know this?

Not at all

Perfectly

What might stop you from using simple hold-out validation?

If there’s too little data available. You can check this if different rounds of shuffling before splitting produce very different model performance

How well did you know this?

Not at all

Perfectly

K-fold cross Validation

Split data into some K number of equal sets, then use all but 1 set to train the data and evaluate the data on the held-out partition. You take the average of the evaluation score as the model evaluation score. Models discarded after generating evaluation score

How well did you know this?

Not at all

Perfectly

Iterated K-fold cross Validation with shuffling

Applying K-fold cross validation multiple times and shuffling the the data before each new split. The final score is the average of the scores from each run of the K-fold validation

How well did you know this?

Not at all

Perfectly

How do you evaluate your model if you have very little data?

Iterated K-fold cross Validation with shuffling

How well did you know this?

Not at all

Perfectly

How should you evaluate your model if it is exhibiting considerable variance in train test split?

K-fold cross validation

How well did you know this?

Not at all

Perfectly

How can you help ensure Data Representativeness in your evaluation?

Using data shuffling before you split

How well did you know this?

Not at all

Perfectly

Why can redundancy in your data be a problem for validation?

Study These Flashcards

Because if there are repeats and then get split into test and training data, your model is then partially trained on your test data

What are some examples of data preprocessing?

Study These Flashcards

Vectorization, Normalization, handling missing values, feature extraction

data vectorization

Study These Flashcards

all inputs and targets in a neural network must be tensors of floating-point data

value normalization

Study These Flashcards

have to make sure each feature is normalized so that its mean is 0 and its standard deviation is 1 is a common method but isn’t always strictly necessary

feature engineering

Study These Flashcards

process by which a human makes a problem easier for a neural network using their deep understanding to explress the problem in a simpler way

Why can good feature engineering still be helpful for deep learning models?

lets you solve the problem with far less data

What is the fundamental tension in machine learning?

The tension between optimization and generalization

Underfit model

When there is still progress to be made in generalization of the model via optimization on the training data

Best way to prevent a model from learning irrelevant or misleading patterns in the training data?

Get more training data

How can you fight overfitting with limited data?

put constraints on what information the model is able to store or how much it is able to store

Regularization

The process of fighting overfitting by putting constraints on the information a model learns

A model's capacity

The number of learnable parameters in a model

What is the simplest way to prevent overfitting via regularization?

by reducing the size of the model: the number of learnable parameters in the model

Why does reducing the memorization capacity of a network help prevent overfitting?

It won't be able to learn mapping as easily so to minimize loss it will have to resort to compressed representation that maximize the predictive power

How does model capacity affect its losses?

Bigger networks will minimize their training loss much faster as they fit to the training data but their validation loss will also be much bigger as they start overfitting much earlier

Weight regularization

adding a cost to the loss function of a network associated with having large weights

L1 regularization

the cost added to the loss function by large weight values is proportional to the absolute value of the weight coefficients (the L1 norm of the weights)

L2 regularization

the cost added is proportional to the square of the value of the weight coefficients (the L2 norm of the weights). Also called weight decay

Dropout

One of the most common and effective regularization techniques. randomly dropping some a number of output features in a layer

Success metric for balanced-classification problems?

Accuracy and area under the receiving operating characterister curve (ROC AUC)

Success metric for imbalanced-classification problems?

Precision, Recall

Success metric for ranking problems and multilabel classification?

Mean average precision

How should data be formatted for a neural network

1. Data in tensors 2. Data should be scaled to small values 3. If Data is heterogeneous should be normalized 4. Often want to do some feature engineering

Statistical Power

A model that is capable of beating a dumb baseline

requirements for a loss function?

1. needs to be computable given only a mini-batch of data | 2. function needs to be differentiable (so you can use backpropagation to train your model)

common last-layer activations

Softmax, Sigmoid, none (for regression)

common loss functions

Binary crossentropy, categorical crossentropy, MSE

How do you know when you've reaced overfitting?

When the model's performance on the validation score starts to degrade

Steps in building neural network

1. Define problem; assemble dataset 2. Choose measure of success 3. Deciding on an evaluation protocol 4. preparing your data 5. Last-layer act, loss function, optimization config 6. Scale up to overfitting 7. Regularize model; Tune hyperparameters

final step before testing on test data?

Evaluate your validation procedure by combing training and validation data. If score is significantly worse than on the validation data, your validation process wasn't reliable or your began to overfit to your validation data

Chapter 4: Fundamentals of ML Flashcards

(49 cards)