6_Validation Set Flashcards
Another Partition
The previous module introduced partitioning a data set into a training set and a test set. This partitioning enabled you to train on one set of examples and then to test the model against a different set of examples. With two partitions, the workflow could look as follows.
In the figure, “Tweak model” means adjusting anything about the model you can dream up—from changing the learning rate, to adding or removing features, to designing a completely new model from scratch. At the end of this workflow, you pick the model that does best on the test set.
Another Partition
Dividing the data set into two sets is a good idea, but not a panacea. You can greatly reduce your chances of overfitting by partitioning the data set into the three subsets shown in the following figure:
Another Partition
Use the validation set to evaluate results from the training set. Then, use the test set to double-check your evaluation after the model has “passed” the validation set. The following figure shows this new workflow.
In this improved workflow:
- Pick the model that does best on the validation set.
- Double-check that model against the test set.
This is a better workflow because it creates fewer exposures to the test set.