Chapter 21 Probability Threshold Moving Flashcards

Question 1

Q

A simple and straightforward approach to improving the performance of a classifier that predicts probabilities on an imbalanced classification problem is to tune the threshold used to map probabilities to class labels. True/False

P 262

Answer

A

True

There are many techniques that may be used to address an imbalanced classification problem, such as sampling the training dataset and developing customized versions of machine learning algorithms. Nevertheless, perhaps the simplest approach to handle a severe class imbalance is to change the decision threshold.

Question 2

Q

The default threshold may not represent an optimal interpretation of the predicted probabilities. This might be the case for a number of reasons, give 4 reasons why.

P 263

Answer

A

The predicted probabilities are not calibrated, e.g. those predicted by an SVM or decision tree.

The metric used to train the model is different from the metric used to evaluate a final model.
* For example, you may use ROC curves to analyze the predicted probabilities of a model and ROC-AUC scores to compare and select a model, although you require crisp class labels from your model. If crisp class labels are required from a model under such an analysis, then an optimal threshold is required

The class distribution is severely skewed.

The cost of one type of misclassification is more important than another type of misclassification.

Worse still, some or all of these reasons may occur at the same time, such as the use of a neural network model with uncalibrated predicted probabilities on an imbalanced classification problem.

Question 3

Q

How is thresholding (threshold-tuning) done?

P 265

Answer

A

1- Fit Model on the Training Dataset.
2- Predict Probabilities on the Test Dataset.
3- For each threshold in Thresholds:
* (a) Convert probabilities to Class Labels using the threshold.
* (b) Evaluate Class Labels.
* (c) If Score is Better than Best Score.
* Adopt Threshold.

4- Use Adopted Threshold When Making Class Predictions on New Data.

Question 4

Q

There are many ways we could locate the threshold with the optimal balance between false positive and true positive rates using a ROC cuver. Knowing the true positive rate is called the Sensitivity and 1-false positive rate is called the Specificity, what’s a suggested metric for finding the optimal balance?

P 268

Answer

A

The Geometric Mean or G-mean is a metric for imbalanced classification that, if optimized (maximized), will seek a balance between the sensitivity and the specificity.
G-mean = √ (Sensitivity × Specificity)

Question 5

Q

What’s J statistic in threshold optimization using ROC curve?

P 271

Answer

A

J = TruePositiveRate – FalsePositiveRate
It’s a much faster way (than G-mean) to get the same result, called the Youden’s J statistic.
Optimizing J statistic= maximizing it

Question 6

Q

What metric do we use if we want to optimize the threshold, that results in the best balance of precision and recall?

P 273

Answer

A

If we are interested in a threshold that results in the best balance of precision and recall, then this is the same as optimizing the F-measure that summarizes the harmonic mean of both measures.
F-measure = 2 × Precision × Recall /(Precision + Recall)

Chapter 21 Probability Threshold Moving Flashcards

(6 cards)