Chapter 18 Principal Component Analysis Flashcards
What’s covariance?
P 164
Covariance is a generalized and unnormalized version of correlation across multiple columns.
There is no pca() function in NumPy. True/False
P 165
True
We can calculate a Principal Component Analysis on a dataset using the ____ class in the scikit-learn library. Once fit, the singular values and principal components can be accessed on the PCA class via the ____ and ____ attributes.
P 166
PCA(), explained_variance_
(eigen values) components_
(eigen vectors)
What are PCA steps? External
1) Calculating eigen vectors of the covariance matrix
1-1) Covariance matrix is a symmetrical matrix, so its eigenvectors are orthogonal
2) The largest eigen value, shows the eigen vector of the covariance matrix that points to the direction of the largest amount of variation in the data
So PCA transforms the cartesian coordinate system to a new coordinate system that describes most of the variation in the data by fewer dimensions
Where should you not use PCA?
External
PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.