Kursusgang 4 (Dimensionality reduction) Flashcards
What is the curse of dimensionality?
The growth of volume is an exponential function of the dimensionality.
Classification accuracy depends on the dimensionality (and amount of training data) but the computational complexity also depends on the dimensionality.
Why do we want to reduce the dimensionality?
Reduce time complexity: Less computation.
Reduce space complexity: Less parameters.
Simpler models are more robust on small datasets.
More interpretable & simpler explanation.
Data visualization (structure, groups, outliers, etc.) if plotted in 2 or 3 dimensions
What are the two general methods of doing dimensionality reduction?
Feature selection that try to find a subset of the original variables. This is done by choosing M < D important features and ignoring the remaining D – M features.
Feature extraction that applies a mapping of the multidimensional space into a space of fewer dimensions. This is done by projecting the original x_i, i =1 ,…, D dimensions to the new M < D dimensions, z_j , j =1 ,…, M
Name the methods of feature selection
Forward search
Hill-climbing algorithm
Backward search
Floating search
How does forward search work?
The method adds the best feature at each step.
Let the set of features F initially Ø. Then, at each iteration, find the best new feature
j = argmin_i E (F ∪ x_i).
Add x_j to F if E(F ∪ xj) < E(F).
How does the hill-climbing algorithm work?
Start with an arbitrary or random solution. Identify neighboring states of the current solution by making small adjustments. If one of the neighboring states offers a better solution (according to some evaluation function), move to this new state. Repeat this process until no neighboring state is better than the current one. At this point, you’ve reached a local maximum
How does backward search work?
It works by iteratively removing features one at a time that are not predictive of the target variable or have the least predictive power.
How does floating search work?
It adds k features and then removes l features. Both k and l and varied, thus searching the best features for a given dimension.
What are the two main feature extraction methods?
Principal component analysis (PCA)
Linear discriminant analysis (LDA)
What is principal component analysis?
It is a unsupervised linear method for dimensionality reduction.
It can be defined in two ways:
* The linear projection that minimizes the
average projection cost, that is defined as the mean squared distance between the data points and their projections
* The direction in the data space that has highest variance. The eigenvector to the largest eigenvalue is the first principal component.
The principal components must be mutually orthogonal.
What does the eigenvalue in principal component analysis mean?
The eigenvalue magnitude signifies on how much variance each component explains
How is principal component analysis performed in practice?
The covariance matrix of data is constructed. Eigenvectors of this matrix are computed. Data are projected.
How should the proportion of variance, M<D, be chosen in principal component analysis?
Proportion of Variance is defined as the fraction of the sum of the M largest eigenvalues over the sum of the D largest eigenvalues. Proportion of variance should be below 0.9.
Define the Fischer criterion.
It is the projection of the data that maximizes the ratio
of class variance relative to the within-class variance.
Define linear discriminant analysis.
Find a low-dimensional space such that when x is projected, classes are well-separated.
For two classes, this simplifies to the Fischer criterion.