Ensemble Methods - Other Modeling Aspects Flashcards
Binary Target
When there is a disproportionate number of observations in one of the classes, a classification model will likely struggle to make proper predictions for the minority class. Thus, in light of sensitivity and specificity, such models tend to have a high value for one of the metrics and a low value for the other one.
Oversampling
Sampling technique for duplicating observations in the minority class to reduce the imbalance.
Only apply to training set
possible to overfit the minority class since duplicating observations
Undersampling
Sampling technique for detecting observations in the majority class to reduce the imbalance.
Only apply to training set
possible to underfit the majority class since observations are removed -> can lose useful info with a smaller dataset
Oversampling and Undersampling for cv
k-fold cv -> divide the training ste into k groups.
perform over/under on all k training sets before fitting the model