Chapter 3. Dimensionality Reduction Flashcards by Mahsa Zamanifard

Two major branches of dimensionality reduction? P 120

Linear projection

Manifold learning, which is also referred to as nonlinear dimensionality reduction

How well did you know this?

Not at all

Perfectly

What techniques does linear projection include? P 120

Principal component analysis
Singular value decomposition
Random projection

How well did you know this?

Not at all

Perfectly

Which techniques does manifold learning include? P 120

Isomap
Multidimensional scaling (MDS)
Locally linear embedding (LLE)
T-distributed stochastic neighbor embedding (t-SNE)
Dictionary learning
Random trees embedding
Independent component analysis

How well did you know this?

Not at all

Perfectly

What kind of distance measure does isomap learn? P 120

It learns the curved distance (also called the geodesic distance) between points rather than the Euclidean distance.

How well did you know this?

Not at all

Perfectly

What are some versions of PCA called? 4 versions P 120

Standard PCA
Incremental PCA
Sparse PCA
Kernel PCA

How well did you know this?

Not at all

Perfectly

Is the regenerated matrix using standard PCA features, exactly the same as the original matrix? P 121

With these components, it is possible to reconstruct the original features —not exactly but generally close enough.

How well did you know this?

Not at all

Perfectly

What is one essential thing to do before running PCA? P 121

It is essential to perform feature scaling before running PCA.

How well did you know this?

Not at all

Perfectly

What is the sklearn pca attribute, for finding the explained variance percentage? P 123

explained_variance_ratio_

How well did you know this?

Not at all

Perfectly

What is the trade-off of using PCA? P 128

PCA-reduced feature set may not perform quite as well in terms of accuracy as a model that is trained on the full feature set, but both the training and prediction times will be much faster. This is one of the important trade-offs you must consider when choosing whether to use dimensionality reduction in your machine learning product.

How well did you know this?

Not at all

Perfectly

When do we use incremental PCA? P 128

For datasets that are very large and cannot fit in memory, we can perform PCA incrementally in small batches, where each batch is able to fit in memory.

How well did you know this?

Not at all

Perfectly

What is sparse PCA? P 130

For some machine learning problems, some degree of sparsity may be preferred. A version of PCA that retains some degree of sparsity—controlled by a hyperparameter called alpha—is known as sparse PCA.

How well did you know this?

Not at all

Perfectly

What is the difference between standard PCA and Sparse PCA? P 130

The normal PCA algorithm searches for linear combinations in all the input variables, reducing the original feature space as densely as possible. The sparse PCA algorithm searches for linear combinations in just some of the input variables, reducing the original feature space to some degree but not as compactly as normal PCA.

How well did you know this?

Not at all

Perfectly

What is kernel PCA? P 132

Normal PCA, incremental PCA, and sparse PCA linearly project the
original data onto a lower dimensional space, but there is also a
nonlinear form of PCA known as kernel PCA, which runs a similarity
function over pairs of original data points in order to perform nonlinear
dimensionality reduction.

How well did you know this?

Not at all

Perfectly

When is kernel PCA especially effective? P 132

This method is especially effective when the original feature set is not linearly separable.

How well did you know this?

Not at all

Perfectly

What is gamma hyperparameter in kernel PCA? P 133

kernel coefficient

How well did you know this?

Not at all

Perfectly

What is alpha is sparse PCA? P 130

Study These Flashcards

degree of sparsity

What is Singular Value Decomposition? P 134

Study These Flashcards

Another approach to learning the underlying structure of the data is to reduce the rank of the original matrix of features to a smaller rank, such that the original matrix can be recreated using a linear combination of some of the vectors in the smaller rank matrix. This is known as singular value decomposition (SVD).

What is the rank of a matrix? External

Study These Flashcards

The maximum number of its linearly independent columns (or rows ) of a matrix is called the rank of a matrix. The rank of a matrix cannot exceed the number of its rows or columns.

Does the relevant structure of the original feature set remain preserved after random projection? P 136

Study These Flashcards

Yes

What are the two versions of Random Projection? P 136

Study These Flashcards

There are two versions of random projection—the standard version known as Gaussian random projection and a sparse version known as sparse random projection.

What does eps hyperparameter control in Gaussian Random Projection? What do lower values of eps mean? P 137

Study These Flashcards

The eps controls the quality of the embedding according to the Johnson–Lindenstrauss lemma, where smaller values generate a higher number of dimensions

Why does the scatter plot from standard PCA is different from sparse PCA? P 132

Study These Flashcards

Normal and sparse PCA generate principal components differently, and the separation of points is somewhat different, too.

Why does Random Projection scatter plot look very different from PCA family’s scatter plots? P 138

Study These Flashcards

Although it is a form of linear projection like PCA, random projection is an entirely different family of dimensionality reduction. Thus the random projection scatter plot looks very different from the scatter plots of normal PCA, incremental PCA, sparse PCA, and kernel PCA.

What are the advantages of using Sparse Random Projection instead of Gaussian Random Projection? P 138

Study These Flashcards

It is generally much more efficient and faster than normal Gaussian random projection

To which member of the PCA family is Isomap similar? How does it reduce dimensionality? P 140

Like kernel PCA, Isomap learns a new, low-dimensional embedding of the original feature set by calculating the pairwise distances of all the points, where distance is curved or geodesic distance rather than Euclidean distance.

What is Multidimensional scaling? What is it based on? P 141

Multidimensional scaling (MDS) is a form of nonlinear dimensionality reduction that learns the similarity of points in the original dataset and, using this similarity learning, models this in a lower dimensional space

Does Locally Linear Embedding preserve distances within local neighborhoods as it projects the data from the original feature space to a reduced space? P 142

Yes

Is Locally Linear Embedding a linear method of dimensionality reduction or a non-linear one? P 142

Non-Linear

What is the main use of t-SNE dimensionality reduction? P 144

T-distributed stochastic neighbor embedding (t-SNE) is a nonlinear dimensionality reduction technique for visualizing high-dimensional data.

Why In real-world applications of t-SNE, is it best to use another dimensionality reduction technique (such as PCA, as we do here) to reduce the number of dimensions before applying t-SNE? P 144

By applying another form of dimensionality reduction first, we reduce the noise in the features that are fed into t-SNE and speed up the computation of the algorithm

Why aren’t the results of t-SNE stable? P 145

t-SNE has a nonconvex cost function, which means that different initializations of the algorithm will generate different results. There is no stable solution.

What dimensionality reduction methods don’t rely on geometry or distance metrics? P 146

•Dictionary Learning •Independent Component Analysis

The dictionary learning, learns the sparse representation of the original data. True or False? P 146

True

What are dictionaries and atoms in dictionary learning? P 146

The resulting matrix is known as the dictionary The vectors in the dictionary are known as atoms.

Atoms are simple, binary vectors. True or False? P 146

True

What problem does Independent Component Analysis address? P 148

One common problem with unlabeled data is that there are many independent signals embedded together into the features we are given. Using independent component analysis (ICA), we can separate these blended signals into their individual components.

When is ICA used? P 149

ICA is commonly used in signal processing tasks (for example, to identify the individual voices in an audio clip of a busy coffeehouse).

Can PCA work with categorical data? External

While it is technically possible to use PCA on discrete variables, or categorical variables that have been one hot encoded variables, you should not. Simply put,**if your variables don't belong on a coordinate plane, then do not apply PCA to them.**

Chapter 3. Dimensionality Reduction Flashcards

(38 cards)