Chapter 14 Oversampling and Undersampling Flashcards
What’s a good starting point for combining sampling techniques? Why?
P 184
A good starting point for combining sampling techniques is to start with random or naive methods. Although they are simple, and often ineffective when applied in isolation, they can be effective when combined.
There are combinations of oversampling and undersampling methods that have proven effective and together may be considered sampling techniques. Two examples are the combination of ____ and ____. The imbalanced-learn Python library provides implementations for both of these combinations directly.
P 187
SMOTE with Tomek Links undersampling
SMOTE with Edited Nearest Neighbors undersampling
____ may be the most popular oversampling technique and can be combined with many different undersampling techniques.
P 189
SMOTE
ENN is more aggressive at downsampling the majority class than Tomek Links, providing more in-depth cleaning. True/False
P 190
True
Note: ENN is used to remove examples from both classes.
The over-sampling methods in general, and ____ and ____ in particular for data sets with few positive (minority) examples, provided very good results in practice.
P 191
Smote + Tomek
Smote + ENN