Chapter 18 Principal Component Analysis Flashcards

Question 1

Q

What’s covariance?

P 164

Answer

A

Covariance is a generalized and unnormalized version of correlation across multiple columns.

Question 2

Q

There is no pca() function in NumPy. True/False

P 165

Question 3

Q

We can calculate a Principal Component Analysis on a dataset using the ____ class in the scikit-learn library. Once fit, the singular values and principal components can be accessed on the PCA class via the ____ and ____ attributes.

P 166

Answer

A

PCA(), explained_variance_ (eigen values) components_(eigen vectors)

Question 4

Q

What are PCA steps? External

Answer

A

1) Calculating eigen vectors of the covariance matrix
1-1) Covariance matrix is a symmetrical matrix, so its eigenvectors are orthogonal

2) The largest eigen value, shows the eigen vector of the covariance matrix that points to the direction of the largest amount of variation in the data

So PCA transforms the cartesian coordinate system to a new coordinate system that describes most of the variation in the data by fewer dimensions

Question 5

Q

Where should you not use PCA?

External

Answer

A

PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.

Ref