PCA Flashcards
What is the purpose of Principal Component Analysis (PCA)?
PCA transforms the original variables ๐1, โฆ, ๐๐ into p new variables ๐1, โฆ, ๐๐ called principal components (PCs).
how are the new variables created by PCA ordered?
The new variables are ordered by how much the variation is accounted for by that variable.
That is: ๐๐๐(๐1)โฅ๐๐๐(๐2)โฅโฆ โฅ๐๐๐(๐๐).
how are the importance of PCs determined?
The variables which account for more variation are more important. If some subset of variables account for most of the variation, itโs convention can forget about the rest of the variables!
what are PCs in a linear algebra sense?
PCs are linear combinations of ๐1, โฆ, ๐๐, i.e.,
๐1=๐11๐1+๐12๐2+ โฆ+๐1๐๐๐โ,โ
๐2=๐21๐1+๐22๐2+ โฆ+๐2๐๐๐, etc.
Example of a Linear Combination:
Z = 2 ร [1 0] + 3 ร [0 1].
Because the coefficient (or weight) of [0 1] is higher, Z points more in the direction of [0
1] than [1 0].
Application to Scalars:
Even though (X_p) are usually scalars, the same idea applies.
Let Xโ = 3, Xโ = 4, aโโ = 2, and aโโ = -1.
Then Zโ is the difference between Xโ and Xโ, with Xโ weighted twice as much as Xโ
why should be first normalize our data so that the variances are all 1?
Normalizing the data so that variances are all 1 ensures that each variable contributes equally to the analysis.
Without normalization, variables with larger variances (often due to differences in units or scales) could dominate the principal components, skewing the results and reducing the interpretability of the analysis.
how is the first PC chosen?
The first principal component, PC1, is chosen so that ๐๐๐(๐1) is as large as possible for any linear combination of ๐1, โฆ, ๐๐.
how is Var(Z_1) made as large as possible?
This is achieved by maximizing C๐โโ, where ๐โโ = [๐โโ
โฎ
๐โโ].
However, this optimization is not interesting unless we enforce the constraint ๐โโ = 1
how are the other PCs that are not the first chosen?
Subsequent PCs are chosen so that:
They have maximal variance (|๐ถ๐โ๐| is as large as possible)
The squares of the weights sum to 1 (which means |๐โ๐|=1)
And each PC is totally uncorrelated with the previous PCs.
i.e.
Each new Principal Component (PC) is chosen to capture as much variation as possible in the data. Each PC is a weighted sum of the original variables, but the weights are normalized so their squares sum to 1.
PCs are meant to be completely independent from each other.
what do the solutions for ๐โ1, โฆ,๐โ๐ turn out to be?
Eigenvectors of the sample covariance matrix C and the variances turn out to be eigenvalues.
Where: ๐๐๐(๐_๐)= ฮป_๐ where ฮป_๐ is the ith largest eigenvalue of ๐ถ.
How do we decide how many principal components to keep?
- Scree Plot: Look for an elbow point where variance explained drops off.
- 80% Rule: Keep enough components to explain at least 80% of total variation.
what does this eqn mean?
C = QVQโ
C is cov matrix
Q is orthonormal matrix that doesnโt change the lenght
V is the matrix of eigenvalues with are the variances
Qโ is Q transpose
what is principle component?
A principal component is a new variable created by PCA that combines the original variables in a way that captures the most important patterns and variation in the data while reducing complexity
true or false, PCs are uncorrelcated with each other
true
explain this eqn: โใ๐๐๐(๐๐)=โฮป๐ใ
the sum of the variances of the orignal values is the sum of the eigenvalues along the diagonal of V
true or false? PCA doesnโt do much to reduce the dimension of data which is largely uncorrelated.
true
when do we use Spectural Decompostion and Singular value decomp.
computing eigenvectors (spectral decomposition) rely on the matrix being invertible
If the matrix isnโt invertible, singular value decomposition (SVD) works better