Feature Imputation Flashcards
1
Q
Feature Imputation Techniques
A
- Nearest neighbor imputation (kNN)
- Drop the data
- Use mean or median
2
Q
Data Rows with Null Values Tradeoffs
A
- If you are confident the data is missing at random and you will have a large dataset remaining after the missing data is dropped, dropping data can be an effective solution
- If the data is not missing at random, then dropping the dataset can introduce significant bias into your model
- Dropping too much data is dangerous
3
Q
Dropping Columns with High Nullability
A
- Before dropping features, consider runing a feature importance analysis on the data you have
4
Q
Mean, Median, or Summary Statistic Substitution
A
- Generally, not a good approach. The more data is missing, the more harmful this approach is.
- Can introduce inconsistent bias