6_Validation Set Flashcards by Julien Heck

Another Partition

The previous module introduced partitioning a data set into a training set and a test set. This partitioning enabled you to train on one set of examples and then to test the model against a different set of examples. With two partitions, the workflow could look as follows.

In the figure, “Tweak model” means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.

How well did you know this?

Not at all

Perfectly

Another Partition

Dividing the data set into two sets is a good idea, but not a panacea. You can greatly reduce your chances of overfitting by partitioning the data set into the three subsets shown in the following figure:

How well did you know this?

Not at all

Perfectly

Another Partition

Use the validation set to evaluate results from the training set. Then, use the test set to double-check your evaluation after the model has “passed” the validation set. The following figure shows this new workflow.

In this improved workflow:

Pick the model that does best on the validation set.
Double-check that model against the test set.

This is a better workflow because it creates fewer exposures to the test set.

How well did you know this?

Not at all

Perfectly

6_Validation Set Flashcards

(3 cards)