8-Model evaluation 1 Flashcards

1
Q

What is holdout evaluation strategy?

A

Partition the data into test and training data. Train only on training data, evaluate only on test data. Usually 80-20, 90-10 splits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of holdout strategy?

A

Simple to work with and implement
High reproducability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the disadvantages of holdout strategy?

A

Size of the split impacts model behaviour. Too many test instances, not enough learning. Too many training instances, not enough validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is repeated random subsampling?

A

Run holdout strategy multiple times on random set of training and test elements. Evaluate by averaging across the iterations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the advantage of repeated random subsampling?

A

Produces more reliable results than holdout strategy alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the disadvantages of repeated random subsampling?

A

Difficult to reproduce
Slower than holdout strategy
Wrong choice of training set, test set size can lead to highly misleading results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is cross validation?

A

Data is split into several partitions (m >= 2) and iteratively one partition is used as test data, while the other m-1 is used as training data. Evaluation metric is aggregated against the other 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the pros of cross validation?

A

Very reproducible
Takes roughly the same time as repeated random subsampling
Every instance is a test instance for some partition
Minimises bias and variance of estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the inductive learning hypothesis?

A

Any hypothesis found to approximate the target function well over a training data set will approximate the the target function well over unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is error rate?

A

Fraction of incorrect predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is error rate not ideal?

A

Some problems require us to penalise false negative errors, e.g. medical diagnosis. Others want to penalise false positive errors, e.g. spam classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is accuracy?

A

Accuracy is 1 - error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is precision?

A

Precision is calculated on a specific label. It is calculated as TP / (TP + FP)

Intuitively it is “how often are we correct when we predict an instance is interesting?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is recall?

A

Recall is calculated on a specific label. It is calculated as TP / (TP + FN)

Intuitively it is “how often do we correctly classify an interesting instance as interesting?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What relationship holds for precision and recall?

A

Usually precision and recall are in an inverse relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the F score?

A

F1 is used when we want precision and recall to be high.

It is calculated as (1+b^2)PR/(b^2*precision + recall)

17
Q

What is ROC (Receiver Operating Characteristics) curve?

A

A graph comparing the rate of true positives and false positives. We want the rate to be in the bottom right triangle of the chart.

TPR = TP/(TP+FN)
FPR = FP(FP + TN)

18
Q

What is a contigency table?

A

A 2 by 2 table capturing the TP, FP, TN and FN

19
Q

What is a confusion matrix?

A

A n by n table capturing the results of a series of tests relative to an interesting class. The interesting class is the only label deemed as TP. All other classes are TN, even if matched incorrectly with the non-interesting class.

20
Q

What is macro-averaging?

A

The average of the precision (recall) values divided by the number of classes

21
Q

What is micro-averaging?

A

The sum of TP across classes divided by the sum of TP and FP across classes for micro-precision. Analogous for Recall

22
Q

What is weighted averaging?

A

The average of precision (recall) values weighted against the proportion of the class in the test data

23
Q

What is a baseline?

A

The baseline model is a naive method or model

24
Q

What is a benchmark?

A

A benchmark is an established rival technique that our model is pitched against

25
What are types of baselines?
Random baseline: Assign a random class Weighted random: Assign a random class based on proportions of classes in training data set Zero-R: Assign the most likely class in the training dataset One-R: Select one attribute and use it to predict an instance's class. Test each attribute and select the one with the lowest error rate on the training data set
26
What are the advantages of One-R?
Simple to understand Simple to comprehend results
27
What are the disadvantages of One-R?
Unable to capture attribute interactions Biased towards high-arity attributes