Class Imbalance Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is class imbalance in machine learning

A

a situation where a dataset’s predictor variable contains more instances of one outcome than another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the majority and minority classes

A

refers to the class with more instances while the minority class has fewer instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two techniques to fix potential issues with class imbalance

A

upsampling and downsampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does downsampling do?

A

altering the majority class by using less of the original dataset to produce a more even split.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is upsampling?

A

It artificially increases the frequency of the minority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How common is it for a dataset to have a perfectly balanced split of classes?

A

It’s extremely rare. Most datasets tend to have some degree of imbalance, where one class has significantly more examples than others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can a dataset with some imbalance still be useful for training a machine learning model?

A

Absolutely! In many cases, a moderate imbalance like 70/30 or 80/20 is perfectly acceptable for training. The model can still learn effectively from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When does class imbalance become a major concern for machine learning models?

A

Major issues arise when the majority class makes up 90% or more of the dataset. This can cause the model to become biased towards the majority class and perform poorly on the minority class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For which variable should class imbalance be considered?

A

for categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which model is class balance applicable to?

A

classification models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is class balancing in machine learning?

A

Class balancing refers to techniques that adjust the number of samples in a dataset to make the proportions of different classes more even.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is class balancing important?

A

Imbalanced datasets can lead models to be biased towards the majority class and perform poorly on the minority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which datasets is downsampling suitable for?

A

Downsampling is suitable for large datasets (tens of thousands of observations or more).

It’s important to ensure model performance doesn’t suffer due to reduced data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is downsampling done?

A

By randomly selecting and removing observations from the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When to use upsampling?

A

Upsampling is used for smaller datasets where removing data from the majority class is not feasible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name 2 ways upsampling can be done?

A

There are two main methods:
1. Duplication: Simply copying existing minority class observations.
2. Synthetic Minority Oversampling Technique (SMOTE): Creates new, unique data points for the minority class based on existing data.

17
Q

Why is it important to keep a separate test set unaltered when applying downsampling or upsampling techniques?

A

1.Real-World Performance: We want to understand how well the model performs on unseen data that reflects the actual class distribution in the real world.
2. Avoiding Overfitting: Balancing techniques can lead to overfitting if applied to the test data. Overfitting occurs when the model memorizes the specific patterns in the training data and performs poorly on new, unseen data.

18
Q

What are consquences of class rebalancing?

A
  1. Over-predicting minority class: Class balancing can lead a model to over-recognize the minority class during prediction. This happens because the model learns from a data distribution that’s different from the real world.
  2. Impact on class probabilities: Techniques like downsampling and upsampling can affect the underlying probabilities the model learns for each class. This can be especially impactful for algorithms like Naive Bayes that rely on these probabilities.
19
Q

At what point should you consider doing a class rebalance?

A

Class rebalancing should be reserved for situations where other alternatives have been exhausted and you still are not achieving satisfactory model results.

20
Q

At what percentage is class imbalance severe enough to consider rebalancing?

A
  • A moderate (< 20%) imbalance may not require any rebalancing.
  • An extreme imbalance (< 1%) would be a more likely candidate.
21
Q

Why is it important to check class balance?

A

If there is a lot more representation of one class than another, then the model may be biased toward the majority class. When this happens, the predictions may be inaccurate.