Cross Validation Flashcards

Question 1

Q

cross-validation vs train test split ( 1 mark)

Answer

A

Cross-validation extends this approach to model scoring (or “model validation.”) Compared to train_test_split, cross-validation gives you a more reliable measure of your model’s quality, though it takes longer to run

Question 2

Q

cross-validation

Answer

A

In cross-validation, we run our modeling process on different subsets of the data to get multiple measures of model quality.

Question 3

Q

how does cross-validation work?

Answer

A

We run an experiment called experiment 1 which uses the first fold as a holdout set, and everything else as training data. This gives us a measure of model quality based on a 20% holdout set, much as we got from using the simple train-test split.

We then run a second experiment, where we hold out data from the second fold (using everything except the 2nd fold for training the model.) This gives us a second estimate of model quality. We repeat this process, using every fold once as the holdout. Putting this together, 100% of the data is used as a holdout at some point.

Question 4

Q

Trade-offs Between Cross-Validation and Train-Test Split

Answer

A

Cross-validation gives a more accurate measure of model quality, which is especially important if you are making a lot of modeling decisions. However, it can take more time to run, because it estimates models once for each fold. So it is doing more total work
On small datasets, the extra computational burden of running cross-validation isn’t a big deal. These are also the problems where model quality scores would be least reliable with train-test split. So, if your dataset is smaller, you should run cross validation.
a simple train-test split is sufficient for larger datasets. It will run faster, and you may have enough data that there’s little need to re-use some of it for holdout.
If your model takes a couple minute or less to run, it’s probably worth switching to cross-validation. If your model takes much longer to run, cross-validation may slow down your workflow more than it’s worth.
You can run cross-validation and see if the scores for each experiment seem close. If each experiment gives the same results, train-test split is probably sufficient.
Using cross-validation gave us much better measures of model quality, with the added benefit of cleaning up our code

Cross Validation Flashcards

(4 cards)