Chapter 25 Framework for Imbalanced Classification Projects Flashcards

1
Q

Describe a systematic framework steps for approaching a ML problem.

P 322

A

(1) selecting a metric by which to evaluate candidate models, (2) testing a suite of algorithms, and (3) tuning the best performing models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s spot-checking in applied ML?

P 326

A

Spot-checking machine learning algorithms means evaluating a suite of different types of algorithms with minimal hyperparameter tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Schema for choosing a metric

P 325

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the point of doing spot-checking?

P 326

A

The objective is to quickly test a range of standard machine learning algorithms and provide a baseline in performance to which techniques specialized for imbalanced classification must be compared and outperform in order to be considered skillful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

There are perhaps four levels of algorithms to spot-check; What are they?

P 326

A
  1. Naive Algorithms
  2. Linear Algorithms
  3. Nonlinear Algorithms
  4. Ensemble Algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The choice of naive algorithm is based on the choice of ____.

P 326

A

performance metric

A suggested mapping of performance metrics to naive algorithms is as follows:
ˆ Accuracy: Predict the majority class (class 0).
ˆ G-mean: Predict a uniformly random class.
ˆ F-measure: Predict the minority class (class 1).
ˆ ROC AUC: Predict a stratified random class.
ˆ PR AUC: Predict a stratified random class.
ˆ Brier Score: Predict majority class prior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should you do if you are unsure of the best naive algorithm for your metric?

P 327

A

perhaps test a few and discover which results in the better performance that you can use as your rock-bottom baseline. Some options include:

ˆ Predict the majority class in all cases.
ˆ Predict the minority class in all cases.
ˆ Predict a uniform randomly selected class.
ˆ Predict a randomly selected class selected with the prior probabilities of each class.
ˆ Predict the class prior probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are linear algorithms, give 3 examples.

P 327

A

Linear algorithms are those that are often drawn from the field of statistics and make strong assumptions about the functional form of the problem. Examples of linear algorithms you should consider trying include:

ˆ Logistic Regression
ˆ Linear Discriminant Analysis
ˆ Naive Bayes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You might also refer to linear algorithms as ____ algorithms. Why?

P 327

A

Probabilistic, because they are often fit under a probabilistic framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Non-linear Models often need more data than linear models to train. True/False

P 327

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are non-linear algorithms? Give 4 examples

P 327

A

Nonlinear algorithms are drawn from the field of machine learning and make few assumptions about the functional form of the problem.
ˆ Decision Tree
ˆ k-Nearest Neighbors
ˆ Artificial Neural Networks
ˆ Support Vector Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are ensemble algorithms, give 4 examples.

P 328

A

Ensemble algorithms are also drawn from the field of machine learning and combine the predictions from two or more models.
ˆ Bagged Decision Trees
ˆ Random Forest
ˆ Extra Trees
ˆ Stochastic Gradient Boosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

There are many ensemble algorithms to choose from, but when spot-checking algorithms, it is a good idea to focus on ensembles of ____ algorithms, why?

P 328

A

Decision tree, given that they are known to perform so well in practice on a wide range of problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Schema for ordered spot-checking for balanced ML models

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 4 types of imbalanced classification techniques to spot-check?

P 329

A
  1. Data Sampling Algorithms
  2. Cost-Sensitive Algorithms
  3. One-Class Algorithms
  4. Probability Tuning Algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the examples of oversampling methods? (6)

P 330

A

Examples of data oversampling methods include:
ˆ Random Oversampling
ˆ SMOTE
ˆ Borderline SMOTE
ˆ SVM SMOTE
ˆ k-Means SMOTE
ˆ ADASYN

17
Q

What are the examples of undersampling methods? (6)

P 330

A

ˆ Random Undersampling
ˆ Condensed Nearest Neighbor
ˆ Tomek Links
ˆ Edited Nearest Neighbors
ˆ Neighborhood Cleaning Rule
ˆ One-Sided Selection

18
Q

Almost any oversampling method can be combined with almost any undersampling technique. True/False

P 330

A

True

most data sampling algorithms make use of the k-nearest neighbor algorithm internally. This algorithm is very sensitive to the data types and scale of input variables. As such, it may be important to at least normalize input variables that have differing scales prior to testing the methods, and perhaps using specialized methods if some input variables are categorical instead of numerical.

19
Q

What are the examples of combinations of over and undersampling methods? (3)

P 330

A

Examples of popular ones include:
ˆ SMOTE and Random Undersampling
ˆ SMOTE and Tomek Links
ˆ SMOTE and Edited Nearest Neighbors

20
Q

What are cost-sensitive algorithms? give examples

P 331

A

Cost-sensitive algorithms are modified versions of machine learning algorithms designed to take the differing costs of misclassification into account when fitting the model on the training dataset

examples of machine learning algorithms that can be configured using cost-sensitive training
include:
ˆ Logistic Regression
ˆ Decision Trees
ˆ Support Vector Machines
ˆ Artificial Neural Networks
ˆ Bagged Decision Trees
ˆ Random Forest
ˆ Stochastic Gradient Boosting

21
Q

What is One Class Classification? give examples.

P 331

A

Algorithms used for outlier detection and anomaly detection can be used for classification tasks. Although unusual, when used in this way, they are often referred to as one-class classification algorithms.
Examples of one-class classification algorithms to try include: ˆ One-Class Support Vector Machines ˆ Isolation Forests ˆ Minimum Covariance Determinant ˆ Local Outlier Factor

In some cases, one-class classification algorithms can be very effective, such as when there is a severe class imbalance with very few examples of the positive class.

22
Q

What are 2 ways of improving predicted probabilities?

P 332

A

Predicted probabilities can be improved in two ways; they are:
ˆ Calibrating Probabilities.
ˆ Tuning the Classification Threshold.

23
Q

For what models do we use threshold tuning? give 4 examples of these type of models.

P 332

A

If probabilistic algorithms are used that natively
predict a probability and class labels are required as output or used to evaluate models, it is a
good idea to try tuning the classification threshold.
ˆ Logistic Regression ˆ Linear Discriminant Analysis ˆ Naive Bayes ˆ Artificial Neural Networks

24
Q

Schema of spot-checking imbalanced ML algorithms

A

Pic

The order of the steps is flexible, and the order of algorithms within each step is also flexible, and the list of algorithms is not complete

25
Q

The simplest approach to hyperparameter tuning is to select the top five or 10 algorithms or algorithm combinations that performed well and tune the hyperparameters for each. There are three popular hyperparameter tuning algorithms that you may choose from: ____

P 335

A

ˆ Random Search ˆ Grid Search ˆ Bayesian Optimization

26
Q

A good default for hyperparameter tuning is ____ if you know what hyperparameter values to try, otherwise, ____ should be used. ____ should be used if possible but can be more challenging to set up and run.

P 335

A

grid search, random search, bayesian optimization