Dimensionality Reduction (Brainscape) Flashcards
What are the main motivations for reducing a dataset’s dimensionality(3)? what are the main drawbacks(4)?
1) speed up subsequent training algorithm
2) to visualize the data and gain insights on the most important features
3) to save space (data compression)
The main drawbacks are:
1) Some information is lost
2) Can be computationally intensive
3) It adds some complexity to your machine learning pipelines.
4) Transformed features are often hard to interpret.
What is the curse of dimensionality?
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.
For example is the fact that randomly sampled high-dimensional vectors are generally very sparse, increasing the risk of overfitting.
Once a dataset’s dimensionality has been reduced, is it possible to reverse the operation? If so, how? if not, why?
It is almost always impossible to perfectly reverse the operation because some information gets lost during dimensionality reduction. But it is possible to estimate with good accuracy what the original dataset looked like.
Does it make any sense to chain two different dimensionality reduction algorithyms?
It can absolutely make sense. A common example is using PCA to quickly get rid of a largue number of useless dimension, then applying another much slower dimensionality reduction algorithm, such as LLE.
What is the main problem of dimensionality reduction?
You lose some information.
What is the main idea of manifold learning.
Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.
Application of dimensionality reduction
Customer relationship management
Text Mining
Image retrieval
Microarray data analysis
Protein classification
face recognition
handwriting digit recognition
intrusion detection
Feature Selection
A process that chooses an optimal subset of features according to a objective function
Objectives: reduce dimensionality and remove noise. Improve speed of learning, predictive accuracy, and simplicity
Think stepwise / forward / backward regressions
Feature Extraction
The mapping of the original high dimensionality data to a lower dimensional space
Goals can change based on end usage:
Unsupervised learning - minimize information loss (PCA)
Supervised learning - maximize class discrimination (LDA)
Think PCA
Pros of feature reduction
All original features are used although they may not be used in the same form. They are combined linearly.
In feature selection, only a subset of the original features are selected
Feature selection methods
Remove features with missing values
remove features with low variance
remove highly correlated features
univariate feature selection
feature selection using select from model
filter methods
wrapper methods
embedded methods
hybrid methods
Filter Methods for Feature Selection
Filter based on:
Information Gain
Chi Squared Test
Fishers Score
Correlation coefficient
Information gain
Calculates the reduction in entropy from the transformation of a dataset
Fisher Score
Fishers score is one of the most widely used supervised feature selection methods.
The algorithm returns the ranks of variables based on the fishers score