Chapter 1-2 What is Imbalanced Classification Flashcards
What’s the difference between unbalanced and imbalanced data? P 21
Unbalanced refers to a class distribution that was balanced and is now no longer balanced, whereas imbalanced refers to a class distribution that is inherently not balanced.
How can we generate an artificial dataset for classification problems? P 27
The make_blobs() function can be used to generate a specified number of examples from a test classification problem with a specified number of classes. X, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state=1, cluster_std=3)
Why should we make sure the minority class is denoted as 1 and the majority is denoted as 0? P 33
Note that when working with binary classification problems, especially imbalanced problems, it is important that the majority class is assigned to class 0 and the minority class is assigned to class 1. This is because many evaluation metrics will assume this relationship.