Chapter 6- Feature Selection Flashcards
what is the aim of feature selection?
automatically identify meaningful smaller subsets of feature variables
why do different types of models have different best feature sets?
different models draw different types of boundaries and allow different degrees of flexibility when you change their parameters
for d features, how many possible feature sets are there?
2^d
what is combinatorial optimisation?
finding the right point in a binary search space
give the steps of wrapper method for feature selection
start with an initial guess for a good set of features
train and test a model (maybe cross val)
if your test error is deemed good enough, stop
otherwise, choose a new set of features and go to line 2
name some wrapper methods
greedy search
genetic algorithm
simulated annealing
branch and bound
what is forward selection?
add features greedily and sequentially. Find which of the remaining ones improves our model the most and add it permanently to our set
what is backward elimination?
sequentially evaluate removing features and discard the one that damages performance the least.
what is stepwise, or floating selection?
wrapper method that combines forward and backward selection
two steps forward and one step back
what are filter methods?
find out how useful a feature is without training any models
Describe the pearsons correlation coefficient equation in words
covariance of the two variables divided by the product of their standard deviations
pearsons correlation coefficient, r = ?
sum (x-xmean)(y-ymean) / square root of(sum:(x-xmean^2) x sum:(y-ymean^2) )
how do we rank features?
In order of the absolute value of the correlation coefficient
what type of correlation does pearsons measure?
linear
what is entropy?
the reduction in uncertainty
what is information gain?
it quantifies the reduction in uncertainty (entropy) of adding a new feature
information gain I(X;Y) = ?
H(Y) - H(Y|X)
another name for information gain is?
mutual information
when we are measuring the information gain of each feature, I(X,Y), we are really calculating? (hint: relation to Y)
the mutual information between the feature and the target label
what is the major advantage of information gain?
detects nonlinearities
what is the major disadvantage of information gain?
may choose redundant features, including very similar features that may be detrimental to the learning algorithm
if i use a wrapper method, with N samples and d features and use a greedy search, how many models will I have to create
(d)(d+1) / 2
why do we perform feature selection (3)
logistical - there may be too much data to process
interpretability - we may collect more data than is useful
overfitting - inclusion of too many features could mean we overfit
pros (3) and cons (2) of forward and backward selection (wrapper method)
- Impact of a feature on the ML classifier is explicit
- Each subset of features, we know exactly how well the model performs.
- Better than exhaustive search.
- No guarantee of best solution
- Need to train and evaluate lots of models
what two types of filter are there
univariate
multivariate
what is a univariate filter
Evaluate each feature independently
what is a multivariate filter
Evaluate features in context of others
what filtering metric can we use when the data is divided into 2 classes, rather than a regression
fisher score
give the equation for fisher score, F=
v1 + v2
in words, the fisher score puts ….. over …
between class scatter (m1-m2)^2 over within class scatter (v1 + v2)
what is the main disadvantage of fisher score
only works on single features
J(X) is the mutual information of X, what are the possibilities of what X can be?
a feature
the joint probability of two or more features
what are embedded methods for feature selection
In Embedded Methods, the feature selection algorithm is integrated as part of the learning algorithm. Embedded methods combine the qualities of filter and wrapper methods.
which wrapper method combines forward and backward selection in two steps forward and one step back
stepwise / floating selection