Lecture 10 - Model Evaluation Flashcards

Question 1

Q

What does it mean when a model has a high bias?

Answer

A

The model does not match the training data closely enough to be useful.

Bias - Limited flexibility

Question 2

Q

What does it mean if a model has a high variance?

Answer

A

It means that it matches the training data too closely.

Variance - sensitivity to specific sets

Question 3

Q

What does it mean if we’re too “fit”?

Answer

A

If we are too “fit” then the model conforms too much to this one data set so we can’t generalize.

Question 4

Q

What is the bias-variance trade off?

Answer

A

It is the effort to minimize two sources of error that prevents supervised learning algorithms from generalizing beyond their training set.

Question 5

Q

What is the irreducible error?

Answer

A

The bias-variance decomposition is composed of tree terms: Bias, variance, and irreducible error.

error = bais + Variance + irreducible error

Question 6

Q

How do you detect overfitting?

Answer

A

Use a separate set of holdout data. We spit the labelled data into two collections: training and evaluation

Question 7

Q

What are the two important properties for detecting overfitting using holdout data?

Answer

A

The data were not used int he training so they cant have been memorized.

They have labels, so we can review models accuracy without labeling costs.

Question 8

Q

What is cross-validation and why is it important?

Answer

A

It allows us to see how our model does, on average across a number of randomized trials.

This will tend toward the population average.

Question 9

Q

Why do we need to be careful using train-test splits?

Answer

A

We don’t want to end up with all one class in training and not in evaluation.

Question 10

Q

How do we make sure that we don’t end up with all one class in our training set?

Answer

A

Make several random splits.

We do this with k-fold cross validation where k can be 3, 5, 10 different splits.

Question 11

Q

Describe underfitting?

Answer

A

It means our model has not captured the complexity present in our training data.

Question 12

Q

You will see excellent performance on the training data and much worse performance on the test data if

Answer

A

The model is overfit

Question 13

Q

If you see bad performance on both the train and the test sets

Answer

A

The model is underfit

Question 14

Q

Training performance is almost always better than

Answer

A

test performance

Question 15

Q

If you have overfit data?

Answer

A

Collect more data
Try a similer model
Apply regularization

Question 16

Q

If you underfit the data?

Answer

Study These Flashcards

A

Try more complex model

Try engineering additional features that may help explain

Question 17

Q

Explain regularization in a sentence

Answer

Study These Flashcards

A

Regularization is a way to penalize models as they become more complex to discourage over fitting during the training process.

Question 18

Q

What does regularization encourage?

Answer

Study These Flashcards

A

A “simpler” hypotheses

Question 19

Q

What effect does regularization have?

Answer

Study These Flashcards

A

Shrinking the coefficients of the model towards zero.

Question 20

Q

Name two common approaches to regularization.

Answer

Study These Flashcards

A

L1 and L2

Question 21

Q

Describe L1 and L2 regularization.

Answer

Study These Flashcards

A

In L1 regularization we add the absolute values

In L2 regularization we use the squared term

Question 22

Q

Which regularization results in better predictions?

Answer

Study These Flashcards

A

L2

Question 23

Q

What is hyperparameter optimization?

Answer

Study These Flashcards

A

Some models have parameters of their own.

Use use grid search over the possible values of parameters in conjunction with cross validation to discover the optimal combination of those parameters.

We learn how best to learn.

Lecture 10 - Model Evaluation Flashcards

(23 cards)