8-Model evaluation 1 Flashcards

1
Q

What is holdout evaluation strategy?

A

Partition the data into test and training data. Train only on training data, evaluate only on test data. Usually 80-20, 90-10 splits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of holdout strategy?

A

Simple to work with and implement
High reproducability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the disadvantages of holdout strategy?

A

Size of the split impacts model behaviour. Too many test instances, not enough learning. Too many training instances, not enough validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is repeated random subsampling?

A

Run holdout strategy multiple times on random set of training and test elements. Evaluate by averaging across the iterations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the advantage of repeated random subsampling?

A

Produces more reliable results than holdout strategy alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the disadvantages of repeated random subsampling?

A

Difficult to reproduce
Slower than holdout strategy
Wrong choice of training set, test set size can lead to highly misleading results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is cross validation?

A

Data is split into several partitions (m >= 2) and iteratively one partition is used as test data, while the other m-1 is used as training data. Evaluation metric is aggregated against the other 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the pros of cross validation?

A

Very reproducible
Takes roughly the same time as repeated random subsampling
Every instance is a test instance for some partition
Minimises bias and variance of estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the inductive learning hypothesis?

A

Any hypothesis found to approximate the target function well over a training data set will approximate the the target function well over unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is error rate?

A

Fraction of incorrect predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is error rate not ideal?

A

Some problems require us to penalise false negative errors, e.g. medical diagnosis. Others want to penalise false positive errors, e.g. spam classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is accuracy?

A

Accuracy is 1 - error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is precision?

A

Precision is calculated on a specific label. It is calculated as TP / (TP + FP)

Intuitively it is “how often are we correct when we predict an instance is interesting?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is recall?

A

Recall is calculated on a specific label. It is calculated as TP / (TP + FN)

Intuitively it is “how often do we correctly classify an interesting instance as interesting?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What relationship holds for precision and recall?

A

Usually precision and recall are in an inverse relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the F score?

A

F1 is used when we want precision and recall to be high.

It is calculated as (1+b^2)PR/(b^2*precision + recall)

17
Q

What is ROC (Receiver Operating Characteristics) curve?

A

A graph comparing the rate of true positives and false positives. We want the rate to be in the bottom right triangle of the chart.

TPR = TP/(TP+FN)
FPR = FP(FP + TN)

18
Q

What is a contigency table?

A

A 2 by 2 table capturing the TP, FP, TN and FN

19
Q

What is a confusion matrix?

A

A n by n table capturing the results of a series of tests relative to an interesting class. The interesting class is the only label deemed as TP. All other classes are TN, even if matched incorrectly with the non-interesting class.

20
Q

What is macro-averaging?

A

The average of the precision (recall) values divided by the number of classes

21
Q

What is micro-averaging?

A

The sum of TP across classes divided by the sum of TP and FP across classes for micro-precision. Analogous for Recall

22
Q

What is weighted averaging?

A

The average of precision (recall) values weighted against the proportion of the class in the test data

23
Q

What is a baseline?

A

The baseline model is a naive method or model

24
Q

What is a benchmark?

A

A benchmark is an established rival technique that our model is pitched against

25
Q

What are types of baselines?

A

Random baseline: Assign a random class
Weighted random: Assign a random class based on proportions of classes in training data set
Zero-R: Assign the most likely class in the training dataset
One-R: Select one attribute and use it to predict an instance’s class. Test each attribute and select the one with the lowest error rate on the training data set

26
Q

What are the advantages of One-R?

A

Simple to understand
Simple to comprehend results

27
Q

What are the disadvantages of One-R?

A

Unable to capture attribute interactions
Biased towards high-arity attributes