Chapter 15 Cost-Sensitive Learning Flashcards

Question 1

Q

What’s cost-sensitive learning?

P 195

Answer

A

Cost-sensitive learning is a subfield of machine learning that takes the costs of prediction errors (and potentially other costs) into account when training a machine learning model.

There is a subfield of machine learning that is focused on learning and using models on data that have uneven penalties or costs when making predictions and more (uneven cost for making errors, FP & FN). This field is generally referred to as Cost-Sensitive Machine Learning, or more simply Cost-Sensitive Learning.

Question 2

Q

Most machine learning algorithms designed for classification assume that there is an equal number of examples for each observed class. True/False

P 196

Question 3

Q

In cost-sensitive learning instead of each instance being either correctly or incorrectly classified, each class (or instance) is given a ____.

P 196

Answer

A

misclassification cost

Instead of trying to optimize the accuracy, the problem is then to minimize the total misclassification cost.

Question 4

Q

Most classifiers assume that the misclassification costs (false negative and false positive cost) are the same. In most real-world applications, this assumption is not true. True/False

P 196

Answer

A

True

Real-world imbalanced binary classification problems typically have a different interpretation for each of the classification errors that can be made. For example, classifying a negative case as a positive (FP) case is typically far less of a problem than classifying a positive case as a negative (FN) case.

Question 5

Q

Define: Cost and Cost minimization.

P 198

Answer

A

Cost: The penalty associated with an incorrect prediction.
Cost Minimization: The goal of cost-sensitive learning is to minimize the cost of a model on a training dataset

Cost-Sensitive Learning is a type of learning that takes the misclassification costs(and possibly other types of cost) into consideration. The goal of this type of learning is to minimize the total cost.

Question 6

Q

Although some methods from cost-sensitive learning can be helpful on imbalanced classification, not all cost-sensitive learning techniques are imbalanced-learning techniques, and conversely, not all methods used to address imbalanced learning are appropriate for cost-sensitive learning. True/False

P 198

Answer

A

True

To make this concrete, we can consider a wide range of other ways we might wish to consider or measure cost when training a model on a dataset:
Cost of misclassification errors (or prediction errors more generally).
Cost of tests or evaluation.
Cost of labeling.
Cost of intervention or changing the system from which observations are drawn.
etc.
The above list highlights that the cost we are interested in for imbalanced-classification is just one type of the range of costs that the broader field of cost-sensitive learning might consider.

Question 7

Q

What’s a cost matrix?

P 200

Answer

A

Cost Matrix: A matrix that assigns a cost to each cell in the confusion matrix.

we use the notation C() to indicate the cost

Question 8

Q

How is the cost-weighted sum of the False Negatives and False Positives calculated?

P 200

Answer

A

TotalCost = C(0, 1) × FalseNegatives + C(1, 0) × FalsePositives
This is the value that we seek to minimize in cost-sensitive learning, at least conceptually.

Question 9

Q

The effectiveness of cost-sensitive learning relies strongly on the supplied____.

P 201

Answer

A

cost matrix (C(0, 1), C(1, 0), C(1, 1), C(0, 0)), Parameters provided there will be of crucial importance to both training and predictions steps.

Question 10

Q

In some problem domains, defining the cost matrix might be obvious. In other domains, defining the cost matrix might be challenging. Give an example for each.

P 201

Answer

A

In an insurance claim example, the costs for a false positive might be the monetary cost of follow-up with the customer to the company and the cost of a false negative might be the cost of the insurance claim. But for example, in a cancer diagnostic test example, the cost of a false positive might be the monetary cost of performing subsequent tests, whereas what is the equivalent dollar cost for letting a sick patient go home and get sicker?

Further, the cost might be a complex multi-dimensional function, including monetary costs, reputation costs, and more.

Question 11

Q

A good starting point for imbalanced classification tasks is to assign costs based on the ____. Give an example of this type of cost assignment.

P 201

Answer

A

Inverse class distribution.

For example, we may have a dataset with a 1 to 100 (1:100) ratio of examples in the minority class to examples in the majority class. This ratio can be inverted and used as the cost of misclassification errors, where the cost of a False Negative is 100 and the cost of a False Positive is 1.

Question 12

Q

There are perhaps three main groups of cost-sensitive methods that are most relevant for imbalanced learning; what are they?

P 202

Answer

A

Cost-Sensitive Resampling( cost-proportionate sampling) (Me: over/under sampling in a misclassification cost sensitive manner)
Cost-Sensitive Algorithms
Cost-Sensitive Ensembles

For imbalanced classification where the cost matrix is defined using the class distribution, there is no difference in the data sampling technique for Cost-Sensitive Resampling.

Question 13

Q

Machine learning algorithms are rarely developed specifically for cost-sensitive learning. What can we do to use them for cost-sensitive problems?

P 202

Answer

A

The wealth of existing machine learning algorithms can be modified to make use of the cost matrix.

Many such algorithm-specific augmentations have
been proposed for popular algorithms, like decision trees and support vector machines.

Question 14

Q

The scikit-learn Python machine learning library provides examples of these cost-sensitive extensions via the ____ argument on the following classifiers: SVC DecisionTreeClassifier

P 202

Answer

A

class_weight

Question 15

Q

Using the costs as a penalty for misclassification can be used for iteratively trained algorithms, such as ____ and ____.

P 203

Answer

A

artificial neural networks
logistic regression

Question 16

Q

The Scikit-learn library provides examples of cost-sensitive extensions to models via the ____ argument on the following classifiers:
____
____.

P 203

Answer

Study These Flashcards

A

class_weight
LogisticRegression
RidgeClassifier

Question 17

Q

What are cost-sensitive ensembles?

P 203

Answer

Study These Flashcards

A

Techniques designed to filter or combine the predictions from traditional machine learning models in order to take misclassification costs into account. (Me:e.g. BalancedRandomForest, BalancedBaggingClassifier, EasyEnsemble)

Making an arbitrary classifier cost-sensitive
by wrapping a cost-minimizing procedure around it.

Question 18

Q

Why are cost-sensitive ensembles called wrappers and meta-learners as well?

P 203

Answer

Study These Flashcards

A

These methods are referred to as wrapper methods as they wrap, a standard machine learning classifier. They are also referred to as meta-learners or ensembles as they learn how to use or combine predictions from other models.

Cost-sensitive meta-learning converts existing cost-insensitive classifiers into costsensitive ones without modifying them. Thus, it can be regarded as a middleware component that pre-processes the training data, or post-processes the output, from the cost-insensitive learning algorithms.

Question 19

Q

What’s the difference between loss function and cost function?

External

Answer

Study These Flashcards

A

There is no major difference. In other words, the loss function is to capture the difference between the actual and predicted values for a single record whereas cost functions aggregate the difference for the entire training dataset.

Ref

Chapter 15 Cost-Sensitive Learning Flashcards

(19 cards)