Chapter 25 Framework for Imbalanced Classification Projects Flashcards
Describe a systematic framework steps for approaching a ML problem.
P 322
(1) selecting a metric by which to evaluate candidate models, (2) testing a suite of algorithms, and (3) tuning the best performing models.
What’s spot-checking in applied ML?
P 326
Spot-checking machine learning algorithms means evaluating a suite of different types of algorithms with minimal hyperparameter tuning.
What’s the point of doing spot-checking?
P 326
The objective is to quickly test a range of standard machine learning algorithms and provide a baseline in performance to which techniques specialized for imbalanced classification must be compared and outperform in order to be considered skillful.
There are perhaps four levels of algorithms to spot-check; What are they?
P 326
- Naive Algorithms
- Linear Algorithms
- Nonlinear Algorithms
- Ensemble Algorithms
The choice of naive algorithm is based on the choice of ____.
P 326
performance metric
A suggested mapping of performance metrics to naive algorithms is as follows:
Accuracy: Predict the majority class (class 0).
G-mean: Predict a uniformly random class.
F-measure: Predict the minority class (class 1).
ROC AUC: Predict a stratified random class.
PR AUC: Predict a stratified random class.
Brier Score: Predict majority class prior.
What should you do if you are unsure of the best naive algorithm for your metric?
P 327
perhaps test a few and discover which results in the better performance that you can use as your rock-bottom baseline. Some options include:
Predict the majority class in all cases.
Predict the minority class in all cases.
Predict a uniform randomly selected class.
Predict a randomly selected class selected with the prior probabilities of each class.
Predict the class prior probabilities.
What are linear algorithms, give 3 examples.
P 327
Linear algorithms are those that are often drawn from the field of statistics and make strong assumptions about the functional form of the problem. Examples of linear algorithms you should consider trying include:
Logistic Regression
Linear Discriminant Analysis
Naive Bayes
You might also refer to linear algorithms as ____ algorithms. Why?
P 327
Probabilistic, because they are often fit under a probabilistic framework.
Non-linear Models often need more data than linear models to train. True/False
P 327
True
What are non-linear algorithms? Give 4 examples
P 327
Nonlinear algorithms are drawn from the field of machine learning and make few assumptions about the functional form of the problem.
Decision Tree
k-Nearest Neighbors
Artificial Neural Networks
Support Vector Machine
What are ensemble algorithms, give 4 examples.
P 328
Ensemble algorithms are also drawn from the field of machine learning and combine the predictions from two or more models.
Bagged Decision Trees
Random Forest
Extra Trees
Stochastic Gradient Boosting
There are many ensemble algorithms to choose from, but when spot-checking algorithms, it is a good idea to focus on ensembles of ____ algorithms, why?
P 328
Decision tree, given that they are known to perform so well in practice on a wide range of problems.
What are 4 types of imbalanced classification techniques to spot-check?
P 329
- Data Sampling Algorithms
- Cost-Sensitive Algorithms
- One-Class Algorithms
- Probability Tuning Algorithms