Techniques for handling class imbalance Flashcards

Question 1

Q

What is class imbalance

Answer

A

Class imbalance occurs when one class significantly outnumbers another in a dataset, leading to biased model predictions.

Question 2

Q

Why does class imbalance cause issues in classification?

Answer

A

It leads to models being biased toward the majority class, reducing the ability to correctly classify the minority class.

Question 3

Q

What is an example of extreme class imbalance?

Answer

A

A dataset with 998 negative samples and only 2 positive samples; predicting only negatives yields 99.8% accuracy but is not useful.

Question 4

Q

What is binary cross-entropy loss?

Answer

A

A loss function used in binary classification that measures the difference between predicted probabilities and actual class labels.

Question 5

Q

How does cross-entropy loss behave in imbalanced datasets?

Answer

A

It favors the majority class, making it difficult for the model to learn the minority class boundaries.

Question 6

Q

What is weighted cross-entropy?

Answer

A

A modification of cross-entropy loss where the minority class is assigned a higher weight to improve model performance.

Question 7

Q

How do you apply class weights in TensorFlow for binary classification?

Answer

A

Use class_weight={0: 1.0, 1: 5.0} when calling model.fit().

Question 8

Q

What is categorical cross-entropy?

Answer

A

A loss function used for multi-class classification that compares predicted probabilities with actual class labels.

Question 9

Q

What is weighted categorical cross-entropy?

Answer

A

A variant of categorical cross-entropy where different class weights are assigned to balance learning across classes.

Question 10

Q

Name three potential solutions for handling class imbalance.

Answer

A

Collect more data, oversampling the minority class, and undersampling the majority class.

Question 11

Q

What is random oversampling?

Answer

A

Duplicating minority class samples to balance the dataset.

Question 12

Q

What is random undersampling?

Answer

A

Removing samples from the majority class to create a balanced dataset

Question 13

Q

What is SMOTE?

Answer

A

Synthetic Minority Over-sampling Technique, a method that generates synthetic minority samples by interpolating between existing instances.

Question 14

Q

What are the drawbacks of random oversampling and undersampling?

Answer

A

Oversampling can lead to overfitting, and undersampling may cause loss of valuable information.

Question 15

Q

What is data augmentation?

Answer

A

A technique that generates new training examples by applying transformations like rotation, flipping, and zooming.

Question 16

Q

How does data augmentation help with class imbalance?

Answer

Study These Flashcards

A

It increases the diversity of minority class samples without altering class distribution.

Question 17

Q

What is an example of an image data augmentation parameter?

Answer

Study These Flashcards

A

rotation_range=40 randomly rotates images up to 40 degrees.

Question 18

Q

What are generative models used for handling class imbalance?

Answer

Study These Flashcards

A

Autoencoders and GANs can generate synthetic samples to improve balance.

Question 19

Q

Why should synthetic data generation be done only on the training set?

Answer

Study These Flashcards

A

To avoid data leakage and ensure the test set remains an unbiased evaluation benchmark.

Question 20

Q

What are three common performance metrics besides accuracy?

Answer

Study These Flashcards

A

Precision, recall, and F1-score.

Question 21

Q

Why is accuracy misleading in imbalanced datasets?

Answer

Study These Flashcards

A

A model can achieve high accuracy by predicting only the majority class while failing to classify the minority class correctly.

Question 22

Q

What is recall?

Answer

Study These Flashcards

A

The fraction of actual positives correctly identified by the model, calculated as TP / (TP + FN).

Question 23

Q

What is precision?

Answer

Study These Flashcards

A

The fraction of predicted positives that are actually correct, calculated as TP / (TP + FP).

Question 24

Q

What paper introduced SMOTE?

Answer

Study These Flashcards

A

“SMOTE: Synthetic Minority Over-sampling Technique” by Chawla et al., 2002.

Techniques for handling class imbalance Flashcards

(24 cards)