8-Model evaluation 1 Flashcards
What is holdout evaluation strategy?
Partition the data into test and training data. Train only on training data, evaluate only on test data. Usually 80-20, 90-10 splits
What are the advantages of holdout strategy?
Simple to work with and implement
High reproducability
What are the disadvantages of holdout strategy?
Size of the split impacts model behaviour. Too many test instances, not enough learning. Too many training instances, not enough validation.
What is repeated random subsampling?
Run holdout strategy multiple times on random set of training and test elements. Evaluate by averaging across the iterations.
What is the advantage of repeated random subsampling?
Produces more reliable results than holdout strategy alone
What are the disadvantages of repeated random subsampling?
Difficult to reproduce
Slower than holdout strategy
Wrong choice of training set, test set size can lead to highly misleading results
What is cross validation?
Data is split into several partitions (m >= 2) and iteratively one partition is used as test data, while the other m-1 is used as training data. Evaluation metric is aggregated against the other 10.
What are the pros of cross validation?
Very reproducible
Takes roughly the same time as repeated random subsampling
Every instance is a test instance for some partition
Minimises bias and variance of estimates
What is the inductive learning hypothesis?
Any hypothesis found to approximate the target function well over a training data set will approximate the the target function well over unseen data
What is error rate?
Fraction of incorrect predictions
Why is error rate not ideal?
Some problems require us to penalise false negative errors, e.g. medical diagnosis. Others want to penalise false positive errors, e.g. spam classification
What is accuracy?
Accuracy is 1 - error rate
What is precision?
Precision is calculated on a specific label. It is calculated as TP / (TP + FP)
Intuitively it is “how often are we correct when we predict an instance is interesting?”
What is recall?
Recall is calculated on a specific label. It is calculated as TP / (TP + FN)
Intuitively it is “how often do we correctly classify an interesting instance as interesting?”
What relationship holds for precision and recall?
Usually precision and recall are in an inverse relationship