Lecture 7 Imputation Flashcards
Model- driven imputation
Train regression model for jssung values
Retrain it after filling in
Flexible
Knn imputation
Efficient implementation would find nearest neighbors only once
Naive — requires the first 2 always be presen t
Tricky if there is no feature that is always non- misusing
Fan yum pure
No fit transform paradigm
Mice is iterative and works well
Nice might be best
Feature selection
Unsupervised
May discard important information
Variance- based: 0 variance or few unique value
Covariance based: remove correlated features
PCA: remove linear Subspaces
Univariate selection
Examine each feature individually to determine the strength of the relationship of feature with two response variable —> provide a score for each feature
F score, chi2
Mutual information
Univariate— doesn’t assume a linear model
Multivariate — model based
Get best fit for a model
Exhaustive search — infeasible
Linear model assume linear relation
Interactive model based
Fit model find least important feature,remove, iterate —- recursive feature elimination
Or start with with single feature, find most important, add