Chapter 18 Principal Component Analysis Flashcards

1
Q

What’s covariance?

P 164

A

Covariance is a generalized and unnormalized version of correlation across multiple columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

There is no pca() function in NumPy. True/False

P 165

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

We can calculate a Principal Component Analysis on a dataset using the ____ class in the scikit-learn library. Once fit, the singular values and principal components can be accessed on the PCA class via the ____ and ____ attributes.

P 166

A

PCA(), explained_variance_ (eigen values) components_(eigen vectors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are PCA steps? External

A

1) Calculating eigen vectors of the covariance matrix
1-1) Covariance matrix is a symmetrical matrix, so its eigenvectors are orthogonal

2) The largest eigen value, shows the eigen vector of the covariance matrix that points to the direction of the largest amount of variation in the data

So PCA transforms the cartesian coordinate system to a new coordinate system that describes most of the variation in the data by fewer dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where should you not use PCA?

External

A

PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly