lecture 10 imbalanced data Flashcards by Lisha Xie

random undersampling

Drop data
Fast training
Loses data

How well did you know this?

Not at all

Perfectly

random oversampling

Repeat sample
Much slower

How well did you know this?

Not at all

Perfectly

class weight

Reweight the loss function
Same effect as oversampling , not as expensive

How well did you know this?

Not at all

Perfectly

ensemble resampling

Random resampling separate for each instance in an ensemble

How well did you know this?

Not at all

Perfectly

edited nearest neighbors

Reducing dataset for knn
Remove Al samples that are mis classified by knn from training
Cleans up outliers and boundaries

How well did you know this?

Not at all

Perfectly

condensed nearest neighbors

Add linings to the data that are mis classified by knn
Focus on the boundaries
Removes many

How well did you know this?

Not at all

Perfectly

synthetic sample generator

Add synthetic interpolated data to smaller class
For each sample in minority class
Pick random neighbor from k neighbors
Pick point on line connecting the two uniformly
Large dataset
Combined with under sampling strategies

How well did you know this?

Not at all

Perfectly