Feature extraction Flashcards
Feature extraction
Feature extraction is a process that transforms or reduces the dimensionality of the original data into a set of new composite features that are more interpretable and useful for machine learning model training. In summary, feature extraction techniques are powerful tools to simplify machine learning problems by reducing the dimensionality of the data. The choice of feature extraction method depends on the specific requirements and constraints of the problem at hand.
- Definition
Feature extraction is a dimensionality reduction process, where an initial set of raw data is reduced to more manageable groups (or features) for processing, while still accurately and comprehensively describing the original data set.
- Goal
The primary goal of feature extraction is to extract a set of features from the raw data that are most relevant for the task at hand. This helps in reducing computational complexity and addresses the ‘curse of dimensionality’ problem in machine learning.
- Principal Component Analysis (PCA)
PCA is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance, and each succeeding component has the highest possible variance under the constraint that it is orthogonal to the preceding components.
- Linear Discriminant Analysis (LDA)
LDA is a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. It is primarily used for dimensionality reduction in the pre-processing step for pattern-classification.
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
t-SNE is a machine learning algorithm for visualization. It is a non-linear dimensionality reduction technique that is particularly well suited for the visualization of high-dimensional datasets.
- Autoencoders
Autoencoders are a type of artificial neural network used for learning efficient codings of input data. They have an input layer, an output layer, and one or more hidden layers connecting them. The output layer has the same number of nodes as the input layer. Their main use is to perform dimensionality reduction for feature extraction.
- Non-Negative Matrix Factorization (NMF)
NMF is a group of algorithms where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to interpret.
- Independent Component Analysis (ICA)
ICA is a computational method for separating a multivariate signal into additive subcomponents. It is a special case of blind source separation.
- Benefits of Feature Extraction
The main benefits of feature extraction include reducing the computational cost, reducing the complexity of the problem, and mitigating issues with data privacy.
- Limitations of Feature Extraction
The limitations of feature extraction include loss of interpretability (as new features may not have a clear interpretation), and risk of information loss due to the reduction of dimensionality.