Assessing and Visualising Model Performance Flashcards
What is:
A Confusion Matrix?
A Confusion Matrix is a is a sort of contingency table that seperates out the decisions made by the chosen classifier and shows how one class is confused for another. It is more informative than the accuracy parameter since it provides insights into the unequal partition of instances in the target classes.
What is:
Accuracy?
The accuracy is a model performance metric that is equal to the ratio of correct decisions to that of incorrect decisions.
What are the issues with using accuracy as a metric for model performance?
Firstly, accuracy is not a good measure for models dealing with imbalanced classes. Since using a majority classifier for a severely skewed data set (labels), would yield a very high accuracy. This should, however, only be a baseline for another model.
Secondly, accuracy does not take into account unequal costs for wrongly classifying instances. The misclassification cost for one class could be higher than that of another.
What is:
Expected Value?
Expected Value is a framework where all possible outcomes of a scenario are given a certain value and then weighted according to their possibility of occuring.
What are:
Class priors?
Class Priors are the probabilities of seeing each class in a dataset, and which are factored out of the expected profit equation, so that the effect of imbalanced classes is separated. This way, only the expected profit, weighted by the posterior probability is left between brackets for both the negative and positive examples.
What is:
True positive rate?
The true positive rate, precision, recall, or hit rate, is the share of positive instances that are predicted positive, in the total number of positive instances.
TPR = TP / (TP + FN)
What is:
True negative rate?
The true negative rate, or sensitivity rate, is the share of negative instances that are predicted negative, in the total number of negative instances.
TNR = TN / (TN + FP)
What is:
The F-measure
What is:
Precision?
Precision is the accuracy over the cases predicted to be positive by a model.
Precision = TP / (TP + FP)
What is:
A Majority Classifier?
A majority classifier is a naive classifier that always chooses the majority class of the training dataset. Its accuracy thus being equal to the share of the majority class in the dataset.
What is:
A Profit Curve?
A Profit Curve is a graph where the expected profit on the y-axis, is plotted against the progressively larger proportions of the consumer base (in percent) are targeted on the x-axis.
Why are all the Profit Curves for one population starting and ending in the same point?
They start and end at the same point, because firstly, no customers are targeted, so profit is zero. While secondly, at the end, 100% of the customers are targeted. So, considering they have the same class priors, the same amount of predicted positives (100%) and the same cost, the expected profit amounts to one same point.
What is:
An ROC Graph
An ROC Graph, or Receiver Operating Characteristics, is a two dimensional plot of a classifier, that plots the false positive rate on the x-axis and the true positive rate on the y-axis. As such, it depicts the tradeoff between benefits (correctly classified positives) and costs (misclassified false positives).
What is:
A Discrete Classifier?
A Discrete Classifier is a classifier that outputs only a label for every instance, but not a score, leading to only a single point in the ROC-graph, since only one confusion matrix can be made of the unique set of true/false positives and true/false negatives.
What is:
The effect of a Budgetary Constraint on the Profit Curve?
A Budgetary Constraint in an expected profit calculation for a Profit Curve causes the optimal point (maximal profit) to shift if the amount that it allows to target, is lower than the amount that should be targeted according to the maximal profit point. It could also cause a change in the preferred classifiers since some classifiers are able to predict the higher scores better than the lower ones and vice versa.