PCA (Brainscape) Flashcards

1
Q

Can PCA be used to reduce the dimensionality of a highly nonlinear dataset?

A

PCA can be used to significantly reduce the dimensionality of most datasets, even if they are highly nonlinear, because it can at least get rid of useless dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Suppose you perform PCA on a 1000 dimensional dataset, setting the explained variance ratio to 95%. How many dimensions will the resulting dataset have?

A

Let look at the two extreme:

First, suppose the dataset is composed of points that are almost perfectly aligned. In this case, PCA can reduce the dataset down to just 1 dimension while still preserving 95% of the variance.

Second suppose a dataset with perfectly random point, it will lower the feature to 95% of 1000 which is 950.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In what cases would you use vanilla PCA, Incremental PCA, or kernel PCA?

A

1) Regular PCA is the default, but it works only if the dataset fits in memory.
2) incremental PCA is useful for large datasets that do not fit in memory, but it is slower.
3) Kernal PCA is usefull for nonlinear datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main idea of principal component analysis (PCA)?

A

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When using PCA what are the principal components?

A

PCA identifies the axis that accounts for the largest amount of variance in the training set. This axis would be our 1th principal component (PC). The algorithm then proceeds to find the second axis, orthogonal to the first one, with the highest variance. That would be our 2th PC. The algorithm then proceeds to the third one and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you find the principal components (PC) when using PCA?

A

We need to apply a standard matrix factorization technique called singular value decomposition (SVD) that can decompose the training set matrix X into the matrix multiplication of three matrices:

X = UΣVᵀ

Where V contains the unit vectors that define all the principal components that we are looking for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(REMAIN MISTAKE)When using PCA, once you used the singular value decomposition:

X = UΣVᵀ

How do you project down the data to d dimensions?

A

X-ₚᵣₒⱼ = XWd

Where Xd-proj is the reduced dataset of dimensionality d, X is the original dataset ,and Wd is defined as the matrix containing the first d columns of V. In other words, Wd contain the orthogonal vectors that induce the lowest reduction in the variance.

note in sklearn pca.explained_variance_ratio tell you the amount of variance contain in each orthogonal axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly