8-Model evaluation 1 Flashcards by Neil Umoh

What is holdout evaluation strategy?

Partition the data into test and training data. Train only on training data, evaluate only on test data. Usually 80-20, 90-10 splits

How well did you know this?

Not at all

Perfectly

What are the advantages of holdout strategy?

Simple to work with and implement
High reproducability

How well did you know this?

Not at all

Perfectly

What are the disadvantages of holdout strategy?

Size of the split impacts model behaviour. Too many test instances, not enough learning. Too many training instances, not enough validation.

How well did you know this?

Not at all

Perfectly

What is repeated random subsampling?

Run holdout strategy multiple times on random set of training and test elements. Evaluate by averaging across the iterations.

How well did you know this?

Not at all

Perfectly

What is the advantage of repeated random subsampling?

Produces more reliable results than holdout strategy alone

How well did you know this?

Not at all

Perfectly

What are the disadvantages of repeated random subsampling?

Difficult to reproduce
Slower than holdout strategy
Wrong choice of training set, test set size can lead to highly misleading results

How well did you know this?

Not at all

Perfectly

What is cross validation?

Data is split into several partitions (m >= 2) and iteratively one partition is used as test data, while the other m-1 is used as training data. Evaluation metric is aggregated against the other 10.

How well did you know this?

Not at all

Perfectly

What are the pros of cross validation?

Very reproducible
Takes roughly the same time as repeated random subsampling
Every instance is a test instance for some partition
Minimises bias and variance of estimates

How well did you know this?

Not at all

Perfectly

What is the inductive learning hypothesis?

Any hypothesis found to approximate the target function well over a training data set will approximate the the target function well over unseen data

How well did you know this?

Not at all

Perfectly

What is error rate?

Fraction of incorrect predictions

How well did you know this?

Not at all

Perfectly

Why is error rate not ideal?

Some problems require us to penalise false negative errors, e.g. medical diagnosis. Others want to penalise false positive errors, e.g. spam classification

How well did you know this?

Not at all

Perfectly

What is accuracy?

Accuracy is 1 - error rate

How well did you know this?

Not at all

Perfectly

What is precision?

Precision is calculated on a specific label. It is calculated as TP / (TP + FP)

Intuitively it is “how often are we correct when we predict an instance is interesting?”

How well did you know this?

Not at all

Perfectly

What is recall?

Recall is calculated on a specific label. It is calculated as TP / (TP + FN)

Intuitively it is “how often do we correctly classify an interesting instance as interesting?”

How well did you know this?

Not at all

Perfectly

What relationship holds for precision and recall?

Usually precision and recall are in an inverse relationship

How well did you know this?

Not at all

Perfectly

What is the F score?

Study These Flashcards

F1 is used when we want precision and recall to be high.

It is calculated as (1+b^2)PR/(b^2*precision + recall)

What is ROC (Receiver Operating Characteristics) curve?

Study These Flashcards

A graph comparing the rate of true positives and false positives. We want the rate to be in the bottom right triangle of the chart.

TPR = TP/(TP+FN)
FPR = FP(FP + TN)

What is a contigency table?

Study These Flashcards

A 2 by 2 table capturing the TP, FP, TN and FN

What is a confusion matrix?

Study These Flashcards

A n by n table capturing the results of a series of tests relative to an interesting class. The interesting class is the only label deemed as TP. All other classes are TN, even if matched incorrectly with the non-interesting class.

What is macro-averaging?

Study These Flashcards

The average of the precision (recall) values divided by the number of classes

What is micro-averaging?

Study These Flashcards

The sum of TP across classes divided by the sum of TP and FP across classes for micro-precision. Analogous for Recall

What is weighted averaging?

Study These Flashcards

The average of precision (recall) values weighted against the proportion of the class in the test data

What is a baseline?

Study These Flashcards

The baseline model is a naive method or model

What is a benchmark?

Study These Flashcards

A benchmark is an established rival technique that our model is pitched against

What are types of baselines?

Random baseline: Assign a random class Weighted random: Assign a random class based on proportions of classes in training data set Zero-R: Assign the most likely class in the training dataset One-R: Select one attribute and use it to predict an instance's class. Test each attribute and select the one with the lowest error rate on the training data set

What are the advantages of One-R?

Simple to understand Simple to comprehend results

What are the disadvantages of One-R?

Unable to capture attribute interactions Biased towards high-arity attributes

8-Model evaluation 1 Flashcards

(27 cards)