Cross Validation Flashcards

1
Q

cross-validation vs train test split ( 1 mark)

A

Cross-validation extends this approach to model scoring (or “model validation.”) Compared to train_test_split, cross-validation gives you a more reliable measure of your model’s quality, though it takes longer to run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

cross-validation

A

In cross-validation, we run our modeling process on different subsets of the data to get multiple measures of model quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how does cross-validation work?

A
  • We run an experiment called experiment 1 which uses the first fold as a holdout set, and everything else as training data. This gives us a measure of model quality based on a 20% holdout set, much as we got from using the simple train-test split.

We then run a second experiment, where we hold out data from the second fold (using everything except the 2nd fold for training the model.) This gives us a second estimate of model quality. We repeat this process, using every fold once as the holdout. Putting this together, 100% of the data is used as a holdout at some point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Trade-offs Between Cross-Validation and Train-Test Split

A
  • Cross-validation gives a more accurate measure of model quality, which is especially important if you are making a lot of modeling decisions. However, it can take more time to run, because it estimates models once for each fold. So it is doing more total work
  • On small datasets, the extra computational burden of running cross-validation isn’t a big deal. These are also the problems where model quality scores would be least reliable with train-test split. So, if your dataset is smaller, you should run cross validation.
  • a simple train-test split is sufficient for larger datasets. It will run faster, and you may have enough data that there’s little need to re-use some of it for holdout.
  • If your model takes a couple minute or less to run, it’s probably worth switching to cross-validation. If your model takes much longer to run, cross-validation may slow down your workflow more than it’s worth.
  • You can run cross-validation and see if the scores for each experiment seem close. If each experiment gives the same results, train-test split is probably sufficient.
  • Using cross-validation gave us much better measures of model quality, with the added benefit of cleaning up our code
How well did you know this?
1
Not at all
2
3
4
5
Perfectly