Dimensionality Reduction Flashcards

Question 1

Q

Why would you want to use dimensionality reduction techniques to transform your data before training?

Answer

A

Dimensionality reduction can allow you to:

remove collinearity from the feature space
speed up training by reducing the number of features
reduce memory usage by reducing the number of features.
Identify underlying, latent features that impact multiple features in the original space

Question 2

Q

Why would you want to avoid dimensionality reduction techniques to transform your data before training?

Answer

A

Dimensionality reduction can:

Add unnecessary computation
Make the model difficult to interpret the latent features are not easy to understand
Add complexity to the model pipeline
Reduce the predictive power of the model if too much signal is lost

Question 3

Q

Name the 4 popular dimensionality reduction algorithms and briefly describe them.

Answer

A

PCA: uses an eigen decomposition to transform the original feature data into linearly independent eigenvectors.

The most important vectors (with highest eigenvalues) are then selected to represent the features in the transformed space.

Non-negative matrix factorization (NMF): can be used to reduce dimensionality for certain problem types while preserving more information than PCA.
Embedding Techniques: various embedding techniques, e.g. finding local neighbors as done in Local Liner Embedding, can be used to reduce dimentionality.
Clustering or Centroid techniques: each value can be described as a member of a cluster, or a linear combination of cluster centroids.

By far the most popular is PCA and similar eigen-decomposition based variations.

Question 4

Q

After doing dimensionality reduction, can you transform the data back to the original feature space? If so, how?

Answer

A

Yes and no.

Most dimensionality methods have inverse transformations, but signal is often lost when reducing dimensions, so the inverse transformation is usually only an approximation of the original data.

Question 5

Q

How do you select the number of principal components needed for PCA?

Answer

A

Selecting the number of latent features to retain is typically done by inspecting the eigenvalue of each eigenvector (where eigenvalue is percent variance explained). As eigenvalues decrease, the impact of the latent feature on the target variable also increases.

This means that principal components with small eigenvalues have a small impact on the model and can be removed.

There are various rules of thumb, but one general rule is to include the most significant principal components that account for at least 95% of the variation of the features.

Dimensionality Reduction Flashcards

DS