Chapter 6- Feature Selection Flashcards

1
Q

what is the aim of feature selection?

A

automatically identify meaningful smaller subsets of feature variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why do different types of models have different best feature sets?

A

different models draw different types of boundaries and allow different degrees of flexibility when you change their parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

for d features, how many possible feature sets are there?

A

2^d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is combinatorial optimisation?

A

finding the right point in a binary search space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

give the steps of wrapper method for feature selection

A

start with an initial guess for a good set of features

train and test a model (maybe cross val)

if your test error is deemed good enough, stop

otherwise, choose a new set of features and go to line 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

name some wrapper methods

A

greedy search
genetic algorithm
simulated annealing
branch and bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is forward selection?

A

add features greedily and sequentially. Find which of the remaining ones improves our model the most and add it permanently to our set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is backward elimination?

A

sequentially evaluate removing features and discard the one that damages performance the least.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is stepwise, or floating selection?

A

wrapper method that combines forward and backward selection

two steps forward and one step back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are filter methods?

A

find out how useful a feature is without training any models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the pearsons correlation coefficient equation in words

A

covariance of the two variables divided by the product of their standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pearsons correlation coefficient, r = ?

A

sum (x-xmean)(y-ymean) / square root of(sum:(x-xmean^2) x sum:(y-ymean^2) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do we rank features?

A

In order of the absolute value of the correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what type of correlation does pearsons measure?

A

linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is entropy?

A

the reduction in uncertainty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is information gain?

A

it quantifies the reduction in uncertainty (entropy) of adding a new feature

17
Q

information gain I(X;Y) = ?

A

H(Y) - H(Y|X)

18
Q

another name for information gain is?

A

mutual information

19
Q

when we are measuring the information gain of each feature, I(X,Y), we are really calculating? (hint: relation to Y)

A

the mutual information between the feature and the target label

20
Q

what is the major advantage of information gain?

A

detects nonlinearities

21
Q

what is the major disadvantage of information gain?

A

may choose redundant features, including very similar features that may be detrimental to the learning algorithm

22
Q

if i use a wrapper method, with N samples and d features and use a greedy search, how many models will I have to create

A

(d)(d+1) / 2

23
Q

why do we perform feature selection (3)

A

logistical - there may be too much data to process

interpretability - we may collect more data than is useful

overfitting - inclusion of too many features could mean we overfit

24
Q

pros (3) and cons (2) of forward and backward selection (wrapper method)

A
  • Impact of a feature on the ML classifier is explicit
  • Each subset of features, we know exactly how well the model performs.
  • Better than exhaustive search.
  • No guarantee of best solution
  • Need to train and evaluate lots of models
25
Q

what two types of filter are there

A

univariate

multivariate

26
Q

what is a univariate filter

A

Evaluate each feature independently

27
Q

what is a multivariate filter

A

Evaluate features in context of others

28
Q

what filtering metric can we use when the data is divided into 2 classes, rather than a regression

A

fisher score

29
Q

give the equation for fisher score, F=

A

v1 + v2

30
Q

in words, the fisher score puts ….. over …

A
between class scatter (m1-m2)^2
over
within class scatter (v1 + v2)
31
Q

what is the main disadvantage of fisher score

A

only works on single features

32
Q

J(X) is the mutual information of X, what are the possibilities of what X can be?

A

a feature

the joint probability of two or more features

33
Q

what are embedded methods for feature selection

A

In Embedded Methods, the feature selection algorithm is integrated as part of the learning algorithm. Embedded methods combine the qualities of filter and wrapper methods.

34
Q

which wrapper method combines forward and backward selection in two steps forward and one step back

A

stepwise / floating selection