9-featureselection Flashcards
What are the two main ways to do feature selection?
Wrapper methods and filtering
What are wrapper methods?
Wrapper methods are a feature selection method focussed on choosing a subset of attributes that give the best performance on the development data
What are the advantages of wrapper methods?
Build feature set with optimal performance on development data
What are the disadvantages of wrapper methods?
They take a long time
What are more practical wrapper methods?
Greedy wrapper method
Ablation wrapper method
What is the greedy wrapper approach?
Train and evaluate model on each single attribute. Choose best attribute. Then train by combining best(s) attributes with each other attribute. Choose best combination. End when accuracy is not increased
What are the disadvantages of greedy wrapper approach?
Still takes n^2/2 time, and converges usually to a suboptimal outcome
What is the ablation approach?
Start with the entire feature set. Remove each attribute and assess on the remaining set. Stop when performance significantly degrades
What are the advantages of ablation method?
Quickly remove irrelevant attributes
What is pointwise mutual information?
PMI(A,C) = log2(P(A,C)/P(A)P(C)). We want to find values with high PMI
What are the disadvantages of ablation method?
Still takes O(m^2). Assumes features are independent.
What are feature filtering methods?
Methods that evaluate the goodness of each feature, by finding features that better predict the class
What makes a single feature good?
Well correlated with class, reverse correlated with class, well correlated with not class
What is mutual information?
The weighted average of all PMI.
P(a, c)PMI(a, c) + P( ̄a, c)PMI( ̄a, c)+
P(a, ̄c)PMI(a, ̄c) + P( ̄a, ̄c)PMI( ̄a, ̄c
What are alternatives to mutual information?
Chi-square