PCA Final Flashcards
PCA
Principal Component Analysis
PCA is a
dimensionality reduction technique
Big idea 1: Take dataset in high dimension space
and transform it so it can be represented in low dimension space, with minimal or no loss of information
Big idea 2: Extract
latent information from the data
The PCA transformation results in
a smaller number of principal components that maximizes the variation of the original dataset, but in low dimension space
These principal components are
linear combinations of the original variables, and become the new axes of the dataset in low dimension space
3 goals of PCA
Feature reduction: reduce the number of features used to represent the data
The reduced feature set should explain a large amount of information (or maximize variance)
Make visible the latent information in the data
PCA creates
projections (principal components) in the direction that captures most of the variance
Sparser data has
greater variance (spread out)
Denser data has
lesser variance (clustered together)
The projections will always be
orthogonal to each other
Mathematics behind PCA
Eigenvalues and Eigenvectors
Mathematics equation
Matrix A times eigenvector X = Eigenvalue times eigenvector
Eigenvalue and Eigenvector meaning
An eigenvector of a matrix is a nonzero vector that, when it is multiplied by the matrix, does not change its direction. Instead, the vector is simply scaled by some factor
Eigenvector are vectors that
remain unchanged when multiplied by A, except for a change in magnitude. Their direction remains unchanged when a linear transformation is applied to it
When we eigendecompose, when we decompose matrix, do eigendecomposition
if my matrix has n columns or n dimensions, i am going to have n eigenvalues and n eigenvectors
Our matrix/dataset gets decomposed into
Eigenvectors
Eigenvalues
Should we standardize for PCA?
Yes, always standardize
five fields returned from prcomp(A,…)
sdev
rotation
center
scale
x
sdev
Square root of the eigenvalues, ordered from largest eigenvalue to the smallest
rotation
Matrix whose columns contain the eigenvectors (also called principal loadings)
center
Mean of the columns of the matrix A
scale
std dev of the columns of the matrix A
x
Data from matrix A in rotated space (also called principal component scores)
How is the data in rotated space computed
dot product
Top and right axis indicate
tell you where these vectors are going to occur, for loading vectors
Bottom and left axis indicate
The scores by which we situate the data point in their new rotated states
How many principal components do we need?
As many that explain most of the variance, and adding any more to the model results in diminishing gains in variance
Key idea: What is the proportion of variance
contributed by each principal component loading?
Total Variation
sum of all PC
Proportion of variance explained by ith principal component loading
PCi / TotalVariation
variance is
squared std dev
What do you have to do before attempting to use observations in any model?
Transform all of your observations (in sample, out of sample) from their natural representation to principal component scores