Chapter 12 Oversampling Methods Flashcards

1
Q

How does SMOTE over sampling work?

P 139

A

Specifically, a random example from the minority class is first chosen. Then k of the nearest neighbors for that example are found (typically k = 5). A randomly selected neighbor is chosen, and a synthetic example is created at a randomly selected point between the two examples in feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The combination of SMOTE and under-sampling performs better than plain under sampling. True/False

P 139

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s a downside of SMOTE?

P 139

A

A general downside of the approach is that synthetic examples are created without considering the majority class, possibly resulting in ambiguous examples if there is a strong overlap for the classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Borederline-SMOTE work? P 147

A

It selects those instances of the minority class that are misclassified, such as with a k-nearest neighbor classification model. We can then oversample just those difficult instances, providing more resolution only where it may be required.

The examples on the borderline and the ones nearby […] are more apt to be misclassified than the ones far from the borderline, and thus more important for
classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Borderline-SMOTE2 and Borderline-SMOTE1?

P 147

A

Borderline-SMOTE2 not only generates synthetic examples from each example in DANGER -of misclassification- and its positive nearest neighbors in P, but also does that from its nearest negative neighbor in N.

Borderline-SMOTE1 is the oversampling of just the borderline cases in minority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Borderline-SMOTE SVM work?

P 149

A

An SVM is used to locate the decision boundary defined by the support vectors, and the examples in the minority class, that are close to the support vectors become the focus for generating synthetic examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In addition to using an SVM, the Borderline-SMOTE SVM attempts to select regions where there are fewer examples of the majority class and tries to extrapolate away from the class boundary. True/False

P 149

A

False, it tries to extrapolate toward the class boundary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does ADASYN synthetic sampling work? P 151

A

It involves generating synthetic samples inversely proportional to the density of the examples in the minority class. That is, generate more synthetic examples in regions of the feature space where the density of minority examples is low, and fewer or none where the density is high. This modification to SMOTE is referred to as the Adaptive Synthetic Sampling Method, or ADASYN.

ADASYN is based on the idea of adaptively generating minority data samples according to their distributions: more synthetic data is generated for minority class samples that are harder to learn compared to those minority samples that are easier to learn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly