Chapter 22 Probability Calibration Flashcards

1
Q

“Unfortunately, the probabilities or probability-like scores predicted by many models are not calibrated.” What does this mean?

P 280

A

This means that they may be over-confident in some cases and under-confident in other cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why there’s more need for probability calibration when data is imbalanced?

P 280

A

The severely skewed class distribution present in imbalanced classification tasks may result in even more bias in the predicted probabilities as they over-favor predicting the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Using machine learning models that predict probabilities is generally preferred when working on imbalanced classification tasks. True/False

P 281

A

True, But the problem is that few machine learning models have calibrated probabilities.

it is good practice to calibrate probabilities in general when working with imbalanced datasets, even of models like logistic regression that predict well-calibrated probabilities when the class labels are balanced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define:
ˆ Calibrated Probabilities.
ˆ Uncalibrated Probabilities.

P 281

A

Probabilities match the true likelihood of events.
Probabilities are over-confident and/or under-confident.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

There are two main causes for uncalibrated probabilities; What are they?

P 281

A

ˆ Algorithms not trained using a probabilistic framework.
ˆ Biases in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Few machine learning algorithms produce calibrated probabilities. This is because for a model to predict calibrated probabilities, it must explicitly be trained under a ____ framework, such as ____. Some examples of algorithms that provide calibrated probabilities include: ____ (4)

P 282

A

probabilistic
maximum likelihood estimation
ˆ Logistic Regression.
ˆ Linear Discriminant Analysis.
ˆ Naive Bayes.
ˆ Artificial Neural Networks.

using machine learning models that predict probabilities is generally preferred when working on imbalanced classification tasks. The problem is that few machine learning models have calibrated probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Many algorithms either predict a probability-like score or a class label and must be coerced in order to produce a probability-like score. As such, these algorithms often require their probabilities to be calibrated prior to use. Examples include: ____ (4)

P 282

A

ˆ Support Vector Machines.
ˆ Decision Trees.
ˆ Ensembles of Decision Trees (bagging, random forest, gradient boosting).
ˆ k-Nearest Neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we need to calibrate the probabilities of models with well calibrated probability such as logistic regression, when data is imbalanced?

P 282

A

A bias in the training dataset, such as a skew in the class distribution, means that the model will naturally predict a higher probability for the majority class than the minority class on average. The problem is, models may overcompensate and give too much focus to the majority class. This even applies to models that typically produce calibrated probabilities like logistic regression.

class probability estimates attained via supervised learning in imbalanced scenarios systematically underestimate the probabilities for minority class instances, despite ostensibly (supposedly) good overall calibration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Probability calibration often involves splitting a training dataset and using one portion to train the model and another portion as a validation set to scale the probabilities. There are two main techniques for scaling predicted probabilities; Name them and explain how they work shortly.

P 283

A

they are Platt scaling and isotonic regression.
ˆ Platt Scaling. Logistic regression model to transform probabilities.
ˆ Isotonic Regression. Weighted least-squares regression model to transform probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Platt Scaling is most effective when the distortion in the predicted probabilities is ____. Isotonic Regression is a more powerful calibration method that can correct any monotonic distortion. It also requires ____ .

P 283

A monotonic function is a function which is either entirely nonincreasing or nondecreasing. A function is monotonic if its first derivative (which need not be continuous) does not change sign.

A

sigmoid-shaped
more training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The scikit-learn library provides access to both Platt scaling and isotonic regression methods for calibrating probabilities via the ____ class. This is a wrapper for a model (like an SVM). The preferred scaling technique is defined via the ____ argument, which can be ____ or ____.

P 284

A

CalibratedClassifierCV, method, ‘sigmoid’ (Platt scaling), ‘isotonic’ (isotonic regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Probability calibration can be evaluated in conjunction with other modifications to the algorithm or dataset to address the skewed class distribution. True/False. Give an example

P 286

A

True
For example, SVM provides the class weight argument that can be set to ‘balanced’ to adjust the margin to favor the minority class. We can include this change to SVM and calibrate the probabilities, and we might expect to see a further lift in model skill.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly