Chapter 16-20 Cost-Sensitive Modeling Flashcards

1
Q

How can we choose the class_weight values for cost-sensitive models?

P 210

A

The class weighing can be defined multiple ways; for example:
ˆ Domain expertise, determined by talking to subject matter experts.
ˆ Tuning, determined by a hyperparameter search such as a grid search.
ˆ Heuristic, specified using a general best practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

-The scikitlearn library provides an implementation of the best practice heuristic for the class weighting. It is implemented via the ____ function.

P 212

A

compute_class_weight()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. How can we overcome this problem?

P 218

A

This problem can be overcome by modifying the criterion used to evaluate split points to take the importance of each class into account, referred to generally as the weighted split-point or weighted decision tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we make a decision tree sensitive to the misclassification cost?

P 222

A

Our intuition for cost-sensitive tree induction is to modify the weight of an instance proportional to the cost of misclassifying the class to which the instance belonged. Higher weights [are] assigned to instances coming from the class with a higher value of misclassification cost.

As such, this modification of the decision tree algorithm is referred to as a weighted decision tree, a class-weighted decision tree, or a cost-sensitive decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The scikit-learn Python machine learning library provides an implementation of the decision tree algorithm that supports class weighting. The DecisionTreeClassifier class provides the ____ argument that can be specified as a model hyperparameter.

P 223

A

Class_weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s hyperparameter “C” in SVM classifier?

P 232

A

[C] determines the number and severity of the violations to the margin (and to the hyperplane) that we will tolerate. We can think of C as a budget for the amount that the margin can be violated by the n observations. A value of C = 0 indicates a hard margin and no tolerance for violations of the margin. Small positive values allow some violation, whereas large integer values, such as 1, 10, and 100 allow for a much softer margin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Although SVMs often produce effective solutions for balanced datasets, they are sensitive to the imbalance in the datasets and produce suboptimal models. True/False

P 232

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

“In SVM, a larger weighting can be used for the minority class, allowing the margin to be softer, whereas a smaller weighting can be used for the majority class, forcing the margin to be harder and preventing misclassified examples.” Explain what this statement means.

P 233

A

This has the effect of encouraging the margin to contain the majority class with less flexibility, but allow the minority class to be flexible with misclassification of majority class examples onto the minority class side if needed. That is, the modified SVM algorithm would not tend to skew the separating hyperplane toward the minority class examples to reduce the total misclassifications, as the minority class examples are now assigned with a higher misclassification cost.

The C parameter is used as a penalty during the fit of the model, specifically the finding of the decision boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we make NN models pay more attention to examples from the minority class than the majority class in datasets with a severely skewed class distribution?

P 240

A

The backpropagation algorithm can be updated to weigh misclassification errors in proportion to the importance of the class, referred to as weighted neural networks or cost-sensitive neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Keras Python deep learning library provides support for class weighting. The ____ function that is used to train Keras neural network models takes an argument called ____.

P 245

A

fit(),class_weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Although XGBoost performs well in general, even on imbalanced classification datasets, it offers a way to tune the training algorithm to pay more attention to misclassification of the minority class for datasets with a skewed class distribution. True/False

P 250

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

XGBoost provides a hyperparameter designed to tune the behavior of the algorithm for imbalanced classification problems; this is the ____ hyperparameter.

P 254

A

Scale_pos_weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the effect of Scale_pos_weight hyperparameter in XGBoost?

P 255

A

This has the effect of scaling errors made by the model during training on the positive class and encourages the model to over-correct them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Scale_pos_weight hyperparameter of XGBoost, can help the model achieve better performance when making predictions on the positive class. What will happen if it’s pushed too far?

P 255

A

It may result in the model overfitting the positive class at the cost of worse performance on the negative class or both classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s a sensible default value to set for the scale_pos_weight hyperparameter?

P 255

A

the inverse of the class distribution.

For example, for a dataset with a 1 to 100 ratio for examples in the minority to majority classes, the scale pos weight can be set to 100. This will given classification errors made by the model on the minority class (positive class) 100 times more impact, and in turn, 100 times more correction than errors made on the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly