Feature Selection Flashcards
The process of identifying small, but informative feature set.
Feature selection
Three categories of feature selection methods.
Filters, embedded methods and wrapper methods.
Feature selection method in which each feature is scored with their impact on the target variable and subsequently the ‘best’ ones are selected.
Filter method
In this feature selection method, feature selection is part of a ML model’s internal model fitting procedure.
Embedded method
This feature selection method is based on a subset selection procedure, which is based on a loop around the ML model (similar to HPO).
Wrapper method
(True or false) Some filter methods also possess hyperparameters.
True
Three learners that can also compute feature scores internally.
ranger, rpart and xgboost
Likely the most flexible, though also most expensive approach
Wrapper approach
(True or false) The automated feature selector can be used like any other learner.
True
(True or false) Feature selection can improve a model’s performance, but there is no guarantee.
True
Desirable properties of a feature for feature selection
Relevant and non-redundant
A filter method that works on the basis of mutual information 𝐼(𝑥; 𝑦) between two (discrete) random variables 𝑋, 𝑌 with distributions 𝑝𝑋, 𝑝𝑌 and 𝑝𝑋,𝑌
mRMR (minimum redundancy, maximal relevancy)
The two types of sequential feature selection
Forward and backward
Permuted versions of the original variables/features (across observations)
Shadow variables
(True or false) Shadow variables are uncorrelated to the target.
True