Techniques for handling class imbalance Flashcards

1
Q

What is class imbalance

A

Class imbalance occurs when one class significantly outnumbers another in a dataset, leading to biased model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why does class imbalance cause issues in classification?

A

It leads to models being biased toward the majority class, reducing the ability to correctly classify the minority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of extreme class imbalance?

A

A dataset with 998 negative samples and only 2 positive samples; predicting only negatives yields 99.8% accuracy but is not useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is binary cross-entropy loss?

A

A loss function used in binary classification that measures the difference between predicted probabilities and actual class labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does cross-entropy loss behave in imbalanced datasets?

A

It favors the majority class, making it difficult for the model to learn the minority class boundaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is weighted cross-entropy?

A

A modification of cross-entropy loss where the minority class is assigned a higher weight to improve model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you apply class weights in TensorFlow for binary classification?

A

Use class_weight={0: 1.0, 1: 5.0} when calling model.fit().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is categorical cross-entropy?

A

A loss function used for multi-class classification that compares predicted probabilities with actual class labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is weighted categorical cross-entropy?

A

A variant of categorical cross-entropy where different class weights are assigned to balance learning across classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name three potential solutions for handling class imbalance.

A

Collect more data, oversampling the minority class, and undersampling the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is random oversampling?

A

Duplicating minority class samples to balance the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is random undersampling?

A

Removing samples from the majority class to create a balanced dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is SMOTE?

A

Synthetic Minority Over-sampling Technique, a method that generates synthetic minority samples by interpolating between existing instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the drawbacks of random oversampling and undersampling?

A

Oversampling can lead to overfitting, and undersampling may cause loss of valuable information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is data augmentation?

A

A technique that generates new training examples by applying transformations like rotation, flipping, and zooming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does data augmentation help with class imbalance?

A

It increases the diversity of minority class samples without altering class distribution.

17
Q

What is an example of an image data augmentation parameter?

A

rotation_range=40 randomly rotates images up to 40 degrees.

18
Q

What are generative models used for handling class imbalance?

A

Autoencoders and GANs can generate synthetic samples to improve balance.

19
Q

Why should synthetic data generation be done only on the training set?

A

To avoid data leakage and ensure the test set remains an unbiased evaluation benchmark.

20
Q

What are three common performance metrics besides accuracy?

A

Precision, recall, and F1-score.

21
Q

Why is accuracy misleading in imbalanced datasets?

A

A model can achieve high accuracy by predicting only the majority class while failing to classify the minority class correctly.

22
Q

What is recall?

A

The fraction of actual positives correctly identified by the model, calculated as TP / (TP + FN).

23
Q

What is precision?

A

The fraction of predicted positives that are actually correct, calculated as TP / (TP + FP).

24
Q

What paper introduced SMOTE?

A

“SMOTE: Synthetic Minority Over-sampling Technique” by Chawla et al., 2002.