Chapter 16-20 Cost-Sensitive Modeling Flashcards
How can we choose the class_weight values for cost-sensitive models?
P 210
The class weighing can be defined multiple ways; for example:
Domain expertise, determined by talking to subject matter experts.
Tuning, determined by a hyperparameter search such as a grid search.
Heuristic, specified using a general best practice.
-The scikitlearn library provides an implementation of the best practice heuristic for the class weighting. It is implemented via the ____ function.
P 212
compute_class_weight()
The split points of the tree are chosen to best separate examples into two groups with minimum mixing. When both groups are dominated by examples from one class, the criterion used to select a split point will see good separation, when in fact, the examples from the minority class are being ignored. How can we overcome this problem?
P 218
This problem can be overcome by modifying the criterion used to evaluate split points to take the importance of each class into account, referred to generally as the weighted split-point or weighted decision tree.
How do we make a decision tree sensitive to the misclassification cost?
P 222
Our intuition for cost-sensitive tree induction is to modify the weight of an instance proportional to the cost of misclassifying the class to which the instance belonged. Higher weights [are] assigned to instances coming from the class with a higher value of misclassification cost.
As such, this modification of the decision tree algorithm is referred to as a weighted decision tree, a class-weighted decision tree, or a cost-sensitive decision tree
The scikit-learn Python machine learning library provides an implementation of the decision tree algorithm that supports class weighting. The DecisionTreeClassifier class provides the ____ argument that can be specified as a model hyperparameter.
P 223
Class_weight
What’s hyperparameter “C” in SVM classifier?
P 232
[C] determines the number and severity of the violations to the margin (and to the hyperplane) that we will tolerate. We can think of C as a budget for the amount that the margin can be violated by the n observations. A value of C = 0 indicates a hard margin and no tolerance for violations of the margin. Small positive values allow some violation, whereas large integer values, such as 1, 10, and 100 allow for a much softer margin.
Although SVMs often produce effective solutions for balanced datasets, they are sensitive to the imbalance in the datasets and produce suboptimal models. True/False
P 232
True
“In SVM, a larger weighting can be used for the minority class, allowing the margin to be softer, whereas a smaller weighting can be used for the majority class, forcing the margin to be harder and preventing misclassified examples.” Explain what this statement means.
P 233
This has the effect of encouraging the margin to contain the majority class with less flexibility, but allow the minority class to be flexible with misclassification of majority class examples onto the minority class side if needed. That is, the modified SVM algorithm would not tend to skew the separating hyperplane toward the minority class examples to reduce the total misclassifications, as the minority class examples are now assigned with a higher misclassification cost.
The C parameter is used as a penalty during the fit of the model, specifically the finding of the decision boundary.
How can we make NN models pay more attention to examples from the minority class than the majority class in datasets with a severely skewed class distribution?
P 240
The backpropagation algorithm can be updated to weigh misclassification errors in proportion to the importance of the class, referred to as weighted neural networks or cost-sensitive neural networks.
The Keras Python deep learning library provides support for class weighting. The ____ function that is used to train Keras neural network models takes an argument called ____.
P 245
fit(),class_weight
Although XGBoost performs well in general, even on imbalanced classification datasets, it offers a way to tune the training algorithm to pay more attention to misclassification of the minority class for datasets with a skewed class distribution. True/False
P 250
True
XGBoost provides a hyperparameter designed to tune the behavior of the algorithm for imbalanced classification problems; this is the ____ hyperparameter.
P 254
Scale_pos_weight
What is the effect of Scale_pos_weight hyperparameter in XGBoost?
P 255
This has the effect of scaling errors made by the model during training on the positive class and encourages the model to over-correct them.
Scale_pos_weight hyperparameter of XGBoost, can help the model achieve better performance when making predictions on the positive class. What will happen if it’s pushed too far?
P 255
It may result in the model overfitting the positive class at the cost of worse performance on the negative class or both classes.
What’s a sensible default value to set for the scale_pos_weight hyperparameter?
P 255
the inverse of the class distribution.
For example, for a dataset with a 1 to 100 ratio for examples in the minority to majority classes, the scale pos weight can be set to 100. This will given classification errors made by the model on the minority class (positive class) 100 times more impact, and in turn, 100 times more correction than errors made on the majority class.