Chapter 3 Challenge of Imbalanced Classification Flashcards

1
Q

Class imbalance was widely acknowledged as a complicating factor for classification. However, some studies also argue that the imbalance ratio is not the only cause of performance degradation in learning from imbalanced data. What are 3 other causes of it? P 42

A

There are many such characteristics, but perhaps three of the most common include:
ˆ Dataset Size.
ˆ Label Noise.
ˆ Data Distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Label Noise? P 45

A

Label noise refers to examples that belong to one class that are labeled as another class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Class noise is generally assumed to be more harmful than attribute noise. True/False?

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why we rarely have well-separated classes in balanced/imbalanced problems? P 48

A

Because it is common that the “concept” beneath a class is split into several sub-concepts, spread over the input space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does data distribution hinder the performance in imbalanced datasets in regards to sub-calsses? P 49

A

We know each class has some sub-classes, with their own specificities, in the minority class, since we don’t have that much data, we can’t detect subclasses easily and it may look like one large sparse grouping so it’s hard to expose the underlying density or distribution of examples in the minority class
For example, we might consider data that describes whether a patient is healthy (majority class) or sick (minority class). The data may capture many different types of illnesses, and there may be groups of similar illnesses, but if there are so few cases, then any grouping or concepts within the class may not be apparent and may look like a diffuse set mixed in with healthy cases. Code P 49

How well did you know this?
1
Not at all
2
3
4
5
Perfectly