PCA Final Flashcards
PCA
Principal Component Analysis
PCA is a
dimensionality reduction technique
Big idea 1: Take dataset in high dimension space
and transform it so it can be represented in low dimension space, with minimal or no loss of information
Big idea 2: Extract
latent information from the data
The PCA transformation results in
a smaller number of principal components that maximizes the variation of the original dataset, but in low dimension space
These principal components are
linear combinations of the original variables, and become the new axes of the dataset in low dimension space
3 goals of PCA
Feature reduction: reduce the number of features used to represent the data
The reduced feature set should explain a large amount of information (or maximize variance)
Make visible the latent information in the data
PCA creates
projections (principal components) in the direction that captures most of the variance
Sparser data has
greater variance (spread out)
Denser data has
lesser variance (clustered together)
The projections will always be
orthogonal to each other
Mathematics behind PCA
Eigenvalues and Eigenvectors
Mathematics equation
Matrix A times eigenvector X = Eigenvalue times eigenvector
Eigenvalue and Eigenvector meaning
An eigenvector of a matrix is a nonzero vector that, when it is multiplied by the matrix, does not change its direction. Instead, the vector is simply scaled by some factor
Eigenvector are vectors that
remain unchanged when multiplied by A, except for a change in magnitude. Their direction remains unchanged when a linear transformation is applied to it