Chapter 3. Dimensionality Reduction Flashcards

1
Q

Two major branches of dimensionality reduction? P 120

A

Linear projection

Manifold learning, which is also referred to as nonlinear dimensionality reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What techniques does linear projection include? P 120

A

Principal component analysis
Singular value decomposition
Random projection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which techniques does manifold learning include? P 120

A

Isomap
Multidimensional scaling (MDS)
Locally linear embedding (LLE)
T-distributed stochastic neighbor embedding (t-SNE)
Dictionary learning
Random trees embedding
Independent component analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of distance measure does isomap learn? P 120

A

It learns the curved distance (also called the geodesic distance) between points rather than the Euclidean distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some versions of PCA called? 4 versions P 120

A

Standard PCA
Incremental PCA
Sparse PCA
Kernel PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is the regenerated matrix using standard PCA features, exactly the same as the original matrix? P 121

A

With these components, it is possible to reconstruct the original features —not exactly but generally close enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is one essential thing to do before running PCA? P 121

A

It is essential to perform feature scaling before running PCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the sklearn pca attribute, for finding the explained variance percentage? P 123

A

explained_variance_ratio_

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the trade-off of using PCA? P 128

A

PCA-reduced feature set may not perform quite as well in terms of accuracy as a model that is trained on the full feature set, but both the training and prediction times will be much faster. This is one of the important trade-offs you must consider when choosing whether to use dimensionality reduction in your machine learning product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do we use incremental PCA? P 128

A

For datasets that are very large and cannot fit in memory, we can perform PCA incrementally in small batches, where each batch is able to fit in memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is sparse PCA? P 130

A

For some machine learning problems, some degree of sparsity may be preferred. A version of PCA that retains some degree of sparsity—controlled by a hyperparameter called alpha—is known as sparse PCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between standard PCA and Sparse PCA? P 130

A

The normal PCA algorithm searches for linear combinations in all the input variables, reducing the original feature space as densely as possible. The sparse PCA algorithm searches for linear combinations in just some of the input variables, reducing the original feature space to some degree but not as compactly as normal PCA.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is kernel PCA? P 132

A

Normal PCA, incremental PCA, and sparse PCA linearly project the
original data onto a lower dimensional space, but there is also a
nonlinear form of PCA known as kernel PCA, which runs a similarity
function over pairs of original data points in order to perform nonlinear
dimensionality reduction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is kernel PCA especially effective? P 132

A

This method is especially effective when the original feature set is not linearly separable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is gamma hyperparameter in kernel PCA? P 133

A

kernel coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is alpha is sparse PCA? P 130

A

degree of sparsity

17
Q

What is Singular Value Decomposition? P 134

A

Another approach to learning the underlying structure of the data is to reduce the rank of the original matrix of features to a smaller rank, such that the original matrix can be recreated using a linear combination of some of the vectors in the smaller rank matrix. This is known as singular value decomposition (SVD).

18
Q

What is the rank of a matrix? External

A

The maximum number of its linearly independent columns (or rows ) of a matrix is called the rank of a matrix. The rank of a matrix cannot exceed the number of its rows or columns.

19
Q

Does the relevant structure of the original feature set remain preserved after random projection? P 136

A

Yes

20
Q

What are the two versions of Random Projection? P 136

A

There are two versions of random projection—the standard version known as Gaussian random projection and a sparse version known as sparse random projection.

21
Q

What does eps hyperparameter control in Gaussian Random Projection? What do lower values of eps mean? P 137

A

The eps controls the quality of the embedding according to the Johnson–Lindenstrauss lemma, where smaller values generate a higher number of dimensions

22
Q

Why does the scatter plot from standard PCA is different from sparse PCA? P 132

A

Normal and sparse PCA generate principal components differently, and the separation of points is somewhat different, too.

23
Q

Why does Random Projection scatter plot look very different from PCA family’s scatter plots? P 138

A

Although it is a form of linear projection like PCA, random projection is an entirely different family of dimensionality reduction. Thus the random projection scatter plot looks very different from the scatter plots of normal PCA, incremental PCA, sparse PCA, and kernel PCA.

24
Q

What are the advantages of using Sparse Random Projection instead of Gaussian Random Projection? P 138

A

It is generally much more efficient and faster than normal Gaussian random projection

25
Q

To which member of the PCA family is Isomap similar? How does it reduce dimensionality? P 140

A

Like kernel PCA, Isomap learns a new, low-dimensional embedding of the original feature set by calculating the pairwise distances of all the points, where distance is curved or geodesic distance rather than Euclidean distance.

26
Q

What is Multidimensional scaling? What is it based on? P 141

A

Multidimensional scaling (MDS) is a form of nonlinear dimensionality reduction that learns the similarity of points in the original dataset and, using this similarity learning, models this in a lower dimensional space

27
Q

Does Locally Linear Embedding preserve distances within local neighborhoods as it projects the data from the original feature space to a reduced space? P 142

A

Yes

28
Q

Is Locally Linear Embedding a linear method of dimensionality reduction or a non-linear one? P 142

A

Non-Linear

29
Q

What is the main use of t-SNE dimensionality reduction? P 144

A

T-distributed stochastic neighbor embedding (t-SNE) is a nonlinear dimensionality reduction technique for visualizing high-dimensional data.

30
Q

Why In real-world applications of t-SNE, is it best to use another dimensionality reduction technique (such as PCA, as we do here) to reduce the number of dimensions before applying t-SNE? P 144

A

By applying another form of dimensionality reduction first, we reduce the noise in the features that are fed into t-SNE and speed up the computation of the algorithm

31
Q

Why aren’t the results of t-SNE stable? P 145

A

t-SNE has a nonconvex cost function, which means that different initializations of the algorithm will generate different results. There is no stable solution.

32
Q

What dimensionality reduction methods don’t rely on geometry or distance metrics? P 146

A

•Dictionary Learning
•Independent Component Analysis

33
Q

The dictionary learning, learns the sparse representation of the original data. True or False? P 146

A

True

34
Q

What are dictionaries and atoms in dictionary learning? P 146

A

The resulting matrix is known as the dictionary
The vectors in the dictionary are known as atoms.

35
Q

Atoms are simple, binary vectors. True or False? P 146

A

True

36
Q

What problem does Independent Component Analysis address? P 148

A

One common problem with unlabeled data is that there are many independent signals embedded together into the features we are given. Using independent component analysis (ICA), we can separate these blended signals into their individual components.

37
Q

When is ICA used? P 149

A

ICA is commonly used in signal processing tasks (for example, to identify the individual voices in an audio clip of a busy coffeehouse).

38
Q

Can PCA work with categorical data? External

A

While it is technically possible to use PCA on discrete variables, or categorical variables that have been one hot encoded variables, you should not. Simply put,if your variables don’t belong on a coordinate plane, then do not apply PCA to them.