Chapter 22 Probability Calibration Flashcards
“Unfortunately, the probabilities or probability-like scores predicted by many models are not calibrated.” What does this mean?
P 280
This means that they may be over-confident in some cases and under-confident in other cases.
Why there’s more need for probability calibration when data is imbalanced?
P 280
The severely skewed class distribution present in imbalanced classification tasks may result in even more bias in the predicted probabilities as they over-favor predicting the majority class.
Using machine learning models that predict probabilities is generally preferred when working on imbalanced classification tasks. True/False
P 281
True, But the problem is that few machine learning models have calibrated probabilities.
it is good practice to calibrate probabilities in general when working with imbalanced datasets, even of models like logistic regression that predict well-calibrated probabilities when the class labels are balanced.
Define:
Calibrated Probabilities.
Uncalibrated Probabilities.
P 281
Probabilities match the true likelihood of events.
Probabilities are over-confident and/or under-confident.
There are two main causes for uncalibrated probabilities; What are they?
P 281
Algorithms not trained using a probabilistic framework.
Biases in the training data.
Few machine learning algorithms produce calibrated probabilities. This is because for a model to predict calibrated probabilities, it must explicitly be trained under a ____ framework, such as ____. Some examples of algorithms that provide calibrated probabilities include: ____ (4)
P 282
probabilistic
maximum likelihood estimation
Logistic Regression.
Linear Discriminant Analysis.
Naive Bayes.
Artificial Neural Networks.
using machine learning models that predict probabilities is generally preferred when working on imbalanced classification tasks. The problem is that few machine learning models have calibrated probabilities.
Many algorithms either predict a probability-like score or a class label and must be coerced in order to produce a probability-like score. As such, these algorithms often require their probabilities to be calibrated prior to use. Examples include: ____ (4)
P 282
Support Vector Machines.
Decision Trees.
Ensembles of Decision Trees (bagging, random forest, gradient boosting).
k-Nearest Neighbors.
Why do we need to calibrate the probabilities of models with well calibrated probability such as logistic regression, when data is imbalanced?
P 282
A bias in the training dataset, such as a skew in the class distribution, means that the model will naturally predict a higher probability for the majority class than the minority class on average. The problem is, models may overcompensate and give too much focus to the majority class. This even applies to models that typically produce calibrated probabilities like logistic regression.
class probability estimates attained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly (supposedly) good overall calibration.
Probability calibration often involves splitting a training dataset and using one portion to train the model and another portion as a validation set to scale the probabilities. There are two main techniques for scaling predicted probabilities; Name them and explain how they work shortly.
P 283
they are Platt scaling and isotonic regression.
Platt Scaling. Logistic regression model to transform probabilities.
Isotonic Regression. Weighted least-squares regression model to transform probabilities.
Platt Scaling is most effective when the distortion in the predicted probabilities is ____. Isotonic Regression is a more powerful calibration method that can correct any monotonic distortion. It also requires ____ .
P 283
A monotonic function is a function which is either entirely nonincreasing or nondecreasing. A function is monotonic if its first derivative (which need not be continuous) does not change sign.
sigmoid-shaped
more training data
The scikit-learn library provides access to both Platt scaling and isotonic regression methods for calibrating probabilities via the ____ class. This is a wrapper for a model (like an SVM). The preferred scaling technique is defined via the ____ argument, which can be ____ or ____.
P 284
CalibratedClassifierCV, method, ‘sigmoid’ (Platt scaling), ‘isotonic’ (isotonic regression)
Probability calibration can be evaluated in conjunction with other modifications to the algorithm or dataset to address the skewed class distribution. True/False. Give an example
P 286
True
For example, SVM provides the class weight argument that can be set to ‘balanced’ to adjust the margin to favor the minority class. We can include this change to SVM and calibrate the probabilities, and we might expect to see a further lift in model skill.