Feature Imputation Flashcards

1
Q

Feature Imputation Techniques

A
  • Nearest neighbor imputation (kNN)
  • Drop the data
  • Use mean or median
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Rows with Null Values Tradeoffs

A
  • If you are confident the data is missing at random and you will have a large dataset remaining after the missing data is dropped, dropping data can be an effective solution
  • If the data is not missing at random, then dropping the dataset can introduce significant bias into your model
  • Dropping too much data is dangerous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dropping Columns with High Nullability

A
  • Before dropping features, consider runing a feature importance analysis on the data you have
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean, Median, or Summary Statistic Substitution

A
  • Generally, not a good approach. The more data is missing, the more harmful this approach is.
  • Can introduce inconsistent bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly