Chapter 4: Fundamentals of ML Flashcards

1
Q

Four broad categories of ML

A
  1. Supervised Learning
  2. Unsupervised Learning
  3. Self-Supervised Learning
  4. Reinforcement Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is validation data helpful?

A

helps you tune the hyperparamaters of the model. The number of layers, this size of layers etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

information leak

A

every time you tune your model using the validation data some information about the validation data leaks into your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why don’t you evaluate ML models with the training data?

A

After a certain number of epochs (which you can’t predict ahead of time) the model will start to overfit to the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you evaluate a ML model?

A

By splitting off some data to evaluate it on called validation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the goal when training machine learning models? models that do what?

A

Generalize well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When you evaluate machine learning models, what are you evaluating?

A

Their ability to generalize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How many sets of data should you use to train and evaluate a model?

A
  1. Train, Validate, Test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hyperparameters vs parameters

A
hyperparameters = number or size of layers in neural network
parameters = the weights of each layer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why does developing a neural network require the number of data sets it does?

A

Because training data sets the parameters of the model, and validation data tunes the hyperparameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are the hyperparameters of the model tuned?

A

using the performance of the model on the validation dat as a feedback signal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classic model evaluation recipes

A

simple hold-out validation, K-fold val, iterated K-fold val with shuffling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Simple hold-out validation

A

set out test, validation and training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What might stop you from using simple hold-out validation?

A

If there’s too little data available. You can check this if different rounds of shuffling before splitting produce very different model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

K-fold cross Validation

A

Split data into some K number of equal sets, then use all but 1 set to train the data and evaluate the data on the held-out partition. You take the average of the evaluation score as the model evaluation score. Models discarded after generating evaluation score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Iterated K-fold cross Validation with shuffling

A

Applying K-fold cross validation multiple times and shuffling the the data before each new split. The final score is the average of the scores from each run of the K-fold validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you evaluate your model if you have very little data?

A

Iterated K-fold cross Validation with shuffling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How should you evaluate your model if it is exhibiting considerable variance in train test split?

A

K-fold cross validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you help ensure Data Representativeness in your evaluation?

A

Using data shuffling before you split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why can redundancy in your data be a problem for validation?

A

Because if there are repeats and then get split into test and training data, your model is then partially trained on your test data

21
Q

What are some examples of data preprocessing?

A

Vectorization, Normalization, handling missing values, feature extraction

22
Q

data vectorization

A

all inputs and targets in a neural network must be tensors of floating-point data

23
Q

value normalization

A

have to make sure each feature is normalized so that its mean is 0 and its standard deviation is 1 is a common method but isn’t always strictly necessary

24
Q

feature engineering

A

process by which a human makes a problem easier for a neural network using their deep understanding to explress the problem in a simpler way

25
Q

Why can good feature engineering still be helpful for deep learning models?

A

lets you solve the problem with far less data

26
Q

What is the fundamental tension in machine learning?

A

The tension between optimization and generalization

27
Q

Underfit model

A

When there is still progress to be made in generalization of the model via optimization on the training data

28
Q

Best way to prevent a model from learning irrelevant or misleading patterns in the training data?

A

Get more training data

29
Q

How can you fight overfitting with limited data?

A

put constraints on what information the model is able to store or how much it is able to store

30
Q

Regularization

A

The process of fighting overfitting by putting constraints on the information a model learns

31
Q

A model’s capacity

A

The number of learnable parameters in a model

32
Q

What is the simplest way to prevent overfitting via regularization?

A

by reducing the size of the model: the number of learnable parameters in the model

33
Q

Why does reducing the memorization capacity of a network help prevent overfitting?

A

It won’t be able to learn mapping as easily so to minimize loss it will have to resort to compressed representation that maximize the predictive power

34
Q

How does model capacity affect its losses?

A

Bigger networks will minimize their training loss much faster as they fit to the training data but their validation loss will also be much bigger as they start overfitting much earlier

35
Q

Weight regularization

A

adding a cost to the loss function of a network associated with having large weights

36
Q

L1 regularization

A

the cost added to the loss function by large weight values is proportional to the absolute value of the weight coefficients (the L1 norm of the weights)

37
Q

L2 regularization

A

the cost added is proportional to the square of the value of the weight coefficients (the L2 norm of the weights). Also called weight decay

38
Q

Dropout

A

One of the most common and effective regularization techniques. randomly dropping some a number of output features in a layer

39
Q

Success metric for balanced-classification problems?

A

Accuracy and area under the receiving operating characterister curve (ROC AUC)

40
Q

Success metric for imbalanced-classification problems?

A

Precision, Recall

41
Q

Success metric for ranking problems and multilabel classification?

A

Mean average precision

42
Q

How should data be formatted for a neural network

A
  1. Data in tensors
  2. Data should be scaled to small values
  3. If Data is heterogeneous should be normalized
  4. Often want to do some feature engineering
43
Q

Statistical Power

A

A model that is capable of beating a dumb baseline

44
Q

requirements for a loss function?

A
  1. needs to be computable given only a mini-batch of data

2. function needs to be differentiable (so you can use backpropagation to train your model)

45
Q

common last-layer activations

A

Softmax, Sigmoid, none (for regression)

46
Q

common loss functions

A

Binary crossentropy, categorical crossentropy, MSE

47
Q

How do you know when you’ve reaced overfitting?

A

When the model’s performance on the validation score starts to degrade

48
Q

Steps in building neural network

A
  1. Define problem; assemble dataset
  2. Choose measure of success
  3. Deciding on an evaluation protocol
  4. preparing your data
  5. Last-layer act, loss function, optimization config
  6. Scale up to overfitting
  7. Regularize model; Tune hyperparameters
49
Q

final step before testing on test data?

A

Evaluate your validation procedure by combing training and validation data. If score is significantly worse than on the validation data, your validation process wasn’t reliable or your began to overfit to your validation data