Chapter 14 Oversampling and Undersampling Flashcards

1
Q

What’s a good starting point for combining sampling techniques? Why?

P 184

A

A good starting point for combining sampling techniques is to start with random or naive methods. Although they are simple, and often ineffective when applied in isolation, they can be effective when combined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There are combinations of oversampling and undersampling methods that have proven effective and together may be considered sampling techniques. Two examples are the combination of ____ and ____. The imbalanced-learn Python library provides implementations for both of these combinations directly.

P 187

A

SMOTE with Tomek Links undersampling
SMOTE with Edited Nearest Neighbors undersampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

____ may be the most popular oversampling technique and can be combined with many different undersampling techniques.

P 189

A

SMOTE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ENN is more aggressive at downsampling the majority class than Tomek Links, providing more in-depth cleaning. True/False

P 190

A

True

Note: ENN is used to remove examples from both classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The over-sampling methods in general, and ____ and ____ in particular for data sets with few positive (minority) examples, provided very good results in practice.

P 191

A

Smote + Tomek
Smote + ENN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly