Chapter 12 Oversampling Methods Flashcards
How does SMOTE over sampling work?
P 139
Specifically, a random example from the minority class is first chosen. Then k of the nearest neighbors for that example are found (typically k = 5). A randomly selected neighbor is chosen, and a synthetic example is created at a randomly selected point between the two examples in feature space.
The combination of SMOTE and under-sampling performs better than plain under sampling. True/False
P 139
True
What’s a downside of SMOTE?
P 139
A general downside of the approach is that synthetic examples are created without considering the majority class, possibly resulting in ambiguous examples if there is a strong overlap for the classes.
How does Borederline-SMOTE work? P 147
It selects those instances of the minority class that are misclassified, such as with a k-nearest neighbor classification model. We can then oversample just those difficult instances, providing more resolution only where it may be required.
The examples on the borderline and the ones nearby […] are more apt to be misclassified than the ones far from the borderline, and thus more important for
classification.
What are Borderline-SMOTE2 and Borderline-SMOTE1?
P 147
Borderline-SMOTE2 not only generates synthetic examples from each example in DANGER -of misclassification- and its positive nearest neighbors in P, but also does that from its nearest negative neighbor in N.
Borderline-SMOTE1 is the oversampling of just the borderline cases in minority class.
How does Borderline-SMOTE SVM work?
P 149
An SVM is used to locate the decision boundary defined by the support vectors, and the examples in the minority class, that are close to the support vectors become the focus for generating synthetic examples.
In addition to using an SVM, the Borderline-SMOTE SVM attempts to select regions where there are fewer examples of the majority class and tries to extrapolate away from the class boundary. True/False
P 149
False, it tries to extrapolate toward the class boundary.
How does ADASYN synthetic sampling work? P 151
It involves generating synthetic samples inversely proportional to the density of the examples in the minority class. That is, generate more synthetic examples in regions of the feature space where the density of minority examples is low, and fewer or none where the density is high. This modification to SMOTE is referred to as the Adaptive Synthetic Sampling Method, or ADASYN.
ADASYN is based on the idea of adaptively generating minority data samples according to their distributions: more synthetic data is generated for minority class samples that are harder to learn compared to those minority samples that are easier to learn.