Chapter 25 Framework for Imbalanced Classification Projects Flashcards
Describe a systematic framework steps for approaching a ML problem.
P 322
(1) selecting a metric by which to evaluate candidate models, (2) testing a suite of algorithms, and (3) tuning the best performing models.
What’s spot-checking in applied ML?
P 326
Spot-checking machine learning algorithms means evaluating a suite of different types of algorithms with minimal hyperparameter tuning.
What’s the point of doing spot-checking?
P 326
The objective is to quickly test a range of standard machine learning algorithms and provide a baseline in performance to which techniques specialized for imbalanced classification must be compared and outperform in order to be considered skillful.
There are perhaps four levels of algorithms to spot-check; What are they?
P 326
- Naive Algorithms
- Linear Algorithms
- Nonlinear Algorithms
- Ensemble Algorithms
The choice of naive algorithm is based on the choice of ____.
P 326
performance metric
A suggested mapping of performance metrics to naive algorithms is as follows:
Accuracy: Predict the majority class (class 0).
G-mean: Predict a uniformly random class.
F-measure: Predict the minority class (class 1).
ROC AUC: Predict a stratified random class.
PR AUC: Predict a stratified random class.
Brier Score: Predict majority class prior.
What should you do if you are unsure of the best naive algorithm for your metric?
P 327
perhaps test a few and discover which results in the better performance that you can use as your rock-bottom baseline. Some options include:
Predict the majority class in all cases.
Predict the minority class in all cases.
Predict a uniform randomly selected class.
Predict a randomly selected class selected with the prior probabilities of each class.
Predict the class prior probabilities.
What are linear algorithms, give 3 examples.
P 327
Linear algorithms are those that are often drawn from the field of statistics and make strong assumptions about the functional form of the problem. Examples of linear algorithms you should consider trying include:
Logistic Regression
Linear Discriminant Analysis
Naive Bayes
You might also refer to linear algorithms as ____ algorithms. Why?
P 327
Probabilistic, because they are often fit under a probabilistic framework.
Non-linear Models often need more data than linear models to train. True/False
P 327
True
What are non-linear algorithms? Give 4 examples
P 327
Nonlinear algorithms are drawn from the field of machine learning and make few assumptions about the functional form of the problem.
Decision Tree
k-Nearest Neighbors
Artificial Neural Networks
Support Vector Machine
What are ensemble algorithms, give 4 examples.
P 328
Ensemble algorithms are also drawn from the field of machine learning and combine the predictions from two or more models.
Bagged Decision Trees
Random Forest
Extra Trees
Stochastic Gradient Boosting
There are many ensemble algorithms to choose from, but when spot-checking algorithms, it is a good idea to focus on ensembles of ____ algorithms, why?
P 328
Decision tree, given that they are known to perform so well in practice on a wide range of problems.
What are 4 types of imbalanced classification techniques to spot-check?
P 329
- Data Sampling Algorithms
- Cost-Sensitive Algorithms
- One-Class Algorithms
- Probability Tuning Algorithms
What are the examples of oversampling methods? (6)
P 330
Examples of data oversampling methods include:
Random Oversampling
SMOTE
Borderline SMOTE
SVM SMOTE
k-Means SMOTE
ADASYN
What are the examples of undersampling methods? (6)
P 330
Random Undersampling
Condensed Nearest Neighbor
Tomek Links
Edited Nearest Neighbors
Neighborhood Cleaning Rule
One-Sided Selection
Almost any oversampling method can be combined with almost any undersampling technique. True/False
P 330
True
most data sampling algorithms make use of the k-nearest neighbor algorithm internally. This algorithm is very sensitive to the data types and scale of input variables. As such, it may be important to at least normalize input variables that have differing scales prior to testing the methods, and perhaps using specialized methods if some input variables are categorical instead of numerical.
What are the examples of combinations of over and undersampling methods? (3)
P 330
Examples of popular ones include:
SMOTE and Random Undersampling
SMOTE and Tomek Links
SMOTE and Edited Nearest Neighbors
What are cost-sensitive algorithms? give examples
P 331
Cost-sensitive algorithms are modified versions of machine learning algorithms designed to take the differing costs of misclassification into account when fitting the model on the training dataset
examples of machine learning algorithms that can be configured using cost-sensitive training
include:
Logistic Regression
Decision Trees
Support Vector Machines
Artificial Neural Networks
Bagged Decision Trees
Random Forest
Stochastic Gradient Boosting
What is One Class Classification? give examples.
P 331
Algorithms used for outlier detection and anomaly detection can be used for classification tasks. Although unusual, when used in this way, they are often referred to as one-class classification algorithms.
Examples of one-class classification algorithms to try include: One-Class Support Vector Machines Isolation Forests Minimum Covariance Determinant Local Outlier Factor
In some cases, one-class classification algorithms can be very effective, such as when there is a severe class imbalance with very few examples of the positive class.
What are 2 ways of improving predicted probabilities?
P 332
Predicted probabilities can be improved in two ways; they are:
Calibrating Probabilities.
Tuning the Classification Threshold.
For what models do we use threshold tuning? give 4 examples of these type of models.
P 332
If probabilistic algorithms are used that natively
predict a probability and class labels are required as output or used to evaluate models, it is a
good idea to try tuning the classification threshold.
Logistic Regression Linear Discriminant Analysis Naive Bayes Artificial Neural Networks
Schema of spot-checking imbalanced ML algorithms
The order of the steps is flexible, and the order of algorithms within each step is also flexible, and the list of algorithms is not complete
The simplest approach to hyperparameter tuning is to select the top five or 10 algorithms or algorithm combinations that performed well and tune the hyperparameters for each. There are three popular hyperparameter tuning algorithms that you may choose from: ____
P 335
Random Search Grid Search Bayesian Optimization
A good default for hyperparameter tuning is ____ if you know what hyperparameter values to try, otherwise, ____ should be used. ____ should be used if possible but can be more challenging to set up and run.
P 335
grid search, random search, bayesian optimization