Assessing and Visualising Model Performance Flashcards

1
Q

What is:

A Confusion Matrix?

A

A Confusion Matrix is a is a sort of contingency table that seperates out the decisions made by the chosen classifier and shows how one class is confused for another. It is more informative than the accuracy parameter since it provides insights into the unequal partition of instances in the target classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is:

Accuracy?

A

The accuracy is a model performance metric that is equal to the ratio of correct decisions to that of incorrect decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the issues with using accuracy as a metric for model performance?

A

Firstly, accuracy is not a good measure for models dealing with imbalanced classes. Since using a majority classifier for a severely skewed data set (labels), would yield a very high accuracy. This should, however, only be a baseline for another model.

Secondly, accuracy does not take into account unequal costs for wrongly classifying instances. The misclassification cost for one class could be higher than that of another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is:

Expected Value?

A

Expected Value is a framework where all possible outcomes of a scenario are given a certain value and then weighted according to their possibility of occuring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are:

Class priors?

A

Class Priors are the probabilities of seeing each class in a dataset, and which are factored out of the expected profit equation, so that the effect of imbalanced classes is separated. This way, only the expected profit, weighted by the posterior probability is left between brackets for both the negative and positive examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is:

True positive rate?

A

The true positive rate, precision, recall, or hit rate, is the share of positive instances that are predicted positive, in the total number of positive instances.

TPR = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is:

True negative rate?

A

The true negative rate, or sensitivity rate, is the share of negative instances that are predicted negative, in the total number of negative instances.

TNR = TN / (TN + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is:

The F-measure

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is:

Precision?

A

Precision is the accuracy over the cases predicted to be positive by a model.

Precision = TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is:

A Majority Classifier?

A

A majority classifier is a naive classifier that always chooses the majority class of the training dataset. Its accuracy thus being equal to the share of the majority class in the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is:

A Profit Curve?

A

A Profit Curve is a graph where the expected profit on the y-axis, is plotted against the progressively larger proportions of the consumer base (in percent) are targeted on the x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are all the Profit Curves for one population starting and ending in the same point?

A

They start and end at the same point, because firstly, no customers are targeted, so profit is zero. While secondly, at the end, 100% of the customers are targeted. So, considering they have the same class priors, the same amount of predicted positives (100%) and the same cost, the expected profit amounts to one same point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is:

An ROC Graph

A

An ROC Graph, or Receiver Operating Characteristics, is a two dimensional plot of a classifier, that plots the false positive rate on the x-axis and the true positive rate on the y-axis. As such, it depicts the tradeoff between benefits (correctly classified positives) and costs (misclassified false positives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is:

A Discrete Classifier?

A

A Discrete Classifier is a classifier that outputs only a label for every instance, but not a score, leading to only a single point in the ROC-graph, since only one confusion matrix can be made of the unique set of true/false positives and true/false negatives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is:

The effect of a Budgetary Constraint on the Profit Curve?

A

A Budgetary Constraint in an expected profit calculation for a Profit Curve causes the optimal point (maximal profit) to shift if the amount that it allows to target, is lower than the amount that should be targeted according to the maximal profit point. It could also cause a change in the preferred classifiers since some classifiers are able to predict the higher scores better than the lower ones and vice versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where are “Conservative Classifiers” placed on the ROC curve?

A

Conservative classifiers are classifiers that are able to predict very little false positives (false alarms), but are only predicting very little true positives.

17
Q

Where are “Permissive Classifiers” placed on the ROC curve?

A

Permissive classifiers are classifiers that are able to predict a lot of true positives (hits), but make a lot of misclassifications through false positives in the meantime.