Dimensionality Reduction Flashcards
Curse of Dimensionality
Increasing the number of features will not always improve classification accuracy, in fact, it may make it worse
Two main routes to reduce dimensionality
Feature extraction
Feature Selection
Application of dimensionality reduction
Customer relationship management
Text Mining
Image retrieval
Microarray data analysis
Protein classification
face recognition
handwriting digit recognition
intrusion detection
Feature Selection
A process that chooses an optimal subset of features according to a objective function
Objectives: reduce dimensionality and remove noise. Improve speed of learning, predictive accuracy, and simplicity
Think stepwise / forward / backward regressions
Feature Extraction
The mapping of the original high dimensionality data to a lower dimensional space
Goals can change based on end usage:
Unsupervised learning - minimize information loss (PCA)
Supervised learning - maximize class discrimination (LDA)
Think PCA
Pros of feature reduction
All original features are used although they may not be used in the same form. They are combined linearly.
In feature selection, only a subset of the original features are selected
Feature selection methods
Remove features with missing values
remove features with low variance
remove highly correlated features
univariate feature selection
feature selection using select from model
filter methods
wrapper methods
embedded methods
hybrid methods
Univariate feature selection
selecting best features based on univariate statistical tests. sklearns selectKbest
Filter Methods for Feature Selection
Filter based on:
Information Gain
Chi Squared Test
Fishers Score
Correlation coefficient
Information gain
Calculates the reduction in entropy from the transformation of a dataset
Fisher Score
Fishers score is one of the most widely used supervised feature selection methods.
The algorithm returns the ranks of variables based on the fishers score
Correlation Coefficient
Variables should be correlated with the target but should be uncorrelated among themselves (think the grid map)
Wrapper Methods
Generally ends with better results than filter methods as it can include feature interactions. It follows a greedy search approach by evaluating all the possible combinations of features against evaluation criterion
Forward selection (start with the best predictor and add), backwards selection (start with all features and remove weak ones), exhaustive (tries all combos), recursive selection (selects features by recursively considering smaller and smaller sets of features)
Embedded methods
These methods encompass the benefits of both wrapper and filter methods, by including interactions of features but also maintaining a reasonable computational cost.
LASSO, Random Forest Importance
LASSO
More accurate than base regressions
Uses shrinkage - where all data values are shrunken towards a central point as the mean
Encourages simple, sparse models
Well suited for models showing high levels of multicollinearity
Regularization consists of adding a penalty to the different parameters of the
machine learning model to reduce the freedom of the model, i.e. to avoid over-
fitting. In linear model regularization, the penalty is applied over the coefficients.
Lasso or L1 is able to shrink some of the coefficients to zero. Therefore, that
feature can be removed from the model.