Chapter 3. Dimensionality Reduction Flashcards
Two major branches of dimensionality reduction? P 120
Linear projection
Manifold learning, which is also referred to as nonlinear dimensionality reduction
What techniques does linear projection include? P 120
Principal component analysis
Singular value decomposition
Random projection
Which techniques does manifold learning include? P 120
Isomap
Multidimensional scaling (MDS)
Locally linear embedding (LLE)
T-distributed stochastic neighbor embedding (t-SNE)
Dictionary learning
Random trees embedding
Independent component analysis
What kind of distance measure does isomap learn? P 120
It learns the curved distance (also called the geodesic distance) between points rather than the Euclidean distance.
What are some versions of PCA called? 4 versions P 120
Standard PCA
Incremental PCA
Sparse PCA
Kernel PCA
Is the regenerated matrix using standard PCA features, exactly the same as the original matrix? P 121
With these components, it is possible to reconstruct the original features —not exactly but generally close enough.
What is one essential thing to do before running PCA? P 121
It is essential to perform feature scaling before running PCA.
What is the sklearn pca attribute, for finding the explained variance percentage? P 123
explained_variance_ratio_
What is the trade-off of using PCA? P 128
PCA-reduced feature set may not perform quite as well in terms of accuracy as a model that is trained on the full feature set, but both the training and prediction times will be much faster. This is one of the important trade-offs you must consider when choosing whether to use dimensionality reduction in your machine learning product.
When do we use incremental PCA? P 128
For datasets that are very large and cannot fit in memory, we can perform PCA incrementally in small batches, where each batch is able to fit in memory.
What is sparse PCA? P 130
For some machine learning problems, some degree of sparsity may be preferred. A version of PCA that retains some degree of sparsity—controlled by a hyperparameter called alpha—is known as sparse PCA.
What is the difference between standard PCA and Sparse PCA? P 130
The normal PCA algorithm searches for linear combinations in all the input variables, reducing the original feature space as densely as possible. The sparse PCA algorithm searches for linear combinations in just some of the input variables, reducing the original feature space to some degree but not as compactly as normal PCA.
What is kernel PCA? P 132
Normal PCA, incremental PCA, and sparse PCA linearly project the
original data onto a lower dimensional space, but there is also a
nonlinear form of PCA known as kernel PCA, which runs a similarity
function over pairs of original data points in order to perform nonlinear
dimensionality reduction.
When is kernel PCA especially effective? P 132
This method is especially effective when the original feature set is not linearly separable.
What is gamma hyperparameter in kernel PCA? P 133
kernel coefficient