Chapter 26 Probability Scoring Metrics Flashcards
Log loss, also called ____, ____, or ____ can be used as a measure for evaluating predicted probabilities. Each predicted probability is compared to the actual class output value (0 or 1) and a score is calculated that penalizes the probability based on the distance from the expected value. The penalty is logarithmic.
P 261
logistic loss, logarithmic loss, cross-entropy
The log loss can be implemented in Python using the ____ function in scikit-learn.
P 262
log_loss()
In the binary classification case, the function takes a list of true outcome values and a list of
probabilities as arguments and calculates the average log loss for the predictions.
Why is log loss not suitable for imbalanced data?
P 263
As an average, we can expect that the score will be suitable with a balanced dataset and misleading when there is a large imbalance between the two classes in the test set.
In Brier Scoring, predictions that are farther away from the expected probability are penalized, but more severely as in the case of log loss. True/False
P 265
False, Predictions that are further away from the expected probability are penalized, but less severely as in the case of log loss.
The Brier score can be calculated in Python using the ____ function in scikit-learn.
P 265
Brier_score_loss()
The skill of a model can be summarized as the average Brier score across all probabilities predicted for a test dataset. This function, takes the true class values (0, 1) and the predicted probabilities for all examples in a test dataset as arguments and returns the average Brier score.
Why Brier score can be misleading when there’s a large imbalance between the classes?
P 266
Model skill is reported as the average Brier across the predictions in a test dataset. As with log loss, we can expect that the score will be suitable with a balanced dataset and misleading when there is a large imbalance between the two classes in the test set.
The Brier error score is always between ____ and ____, where a model with perfect skill has a score of ____.
P265
0.0, 1.0, 0.0
What’s Brier Skill Score (BSS)?
P 268
The Brier Skill Score reports the relative skill of the probability prediction over the naive forecast.
BSS = 1 − ( BS/ BSref )
Where BS is the Brier skill of model, and BSref is the Brier skill of the naive prediction.
When does tuning the threshold become important?
P 269
Tuning the threshold by the operator is particularly important on problems where one type of error is more or less important than another or when a model makes disproportionately more or less of a specific type of error.
The Receiver Operating Characteristic, or ROC, curve is a plot of ____ versus ____ for the predictions of a model for multiple thresholds between 0.0 and 1.0.
P 269
the true positive rate, the false positive rate
What does ROC-AUC show?
P 271
The integrated area under the ROC curve, called AUC or ROC AUC, provides a measure of the skill of the model across all evaluated thresholds.
The ROC-AUC score can be calculated in Python using the ____ function in scikit-learn.
P 271
Roc_auc_score()
An ROC-AUC score is a measure of the likelihood that the model that produced the predictions will rank a randomly chosen positive example above a randomly chosen negative example. Specifically, that the probability will be higher for a real event (class = 1) than a real non-event (class = 0). This is an instructive definition that offers two important intuitions:
Naive Prediction.(under ROC AUC)
Insensitivity to Class Imbalance.
Explain what each of these intuitions mean.
P 271
- A naive prediction under ROC AUC is any constant probability. If the same probability is predicted for every example, there is no discrimination between positive and negative cases, therefore the model has no skill (AUC=0.5).
- ROC AUC is a summary on the models ability to correctly discriminate a single example across different thresholds. As such, it is unconcerned with the base likelihood of each class.
Why is ROC_AUC a better tool for model selection rather than in quantifying the practical skill of a model’s predicted probabilities?
P 272
An important consideration in choosing the ROC AUC is that it does not summarize the specific discriminative power of the model, rather the general discriminative power across all thresholds. It might be a better tool for model selection rather than in quantifying the practical skill of a model’s predicted probabilities.