PCA Literature Flashcards
What does a PCA do?
It analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated.
What is the goal of a PCA?
Extracting the important information from the data table and expressing this information as a set of new orthogonal variables called “principle components”.
How are matrices, vectors and elements denoted?
- Matrices in upper case bold.
- Vectors in lower case bold.
- Elements in lower case italic.
- Note: matrices, vectors, and elements from the same matrix all use the same letter.
What does the PCA data table consist of?
I observations that are described by J variables. It is represented by the I x J matrix, whose generic element is x(ij).
What is a covariance PCA?
When each element of X is divided by sqrt(I) or sqrt(I-1).
What is a correlation PCA?
When variables are standardized to a unit norm. This is done by dividing each variable by its norm.
What is the singular value decomposition for the matrix X and what do the values mean?
X = P(delta)Q^T
- P is the IL of the matrix of left singular vectors.
- Q is the JL of the matrix of right singular vectors.
- Delta is the diagonal matrix of singular values.
What is the inertia of a column?
The sum of the squared elements of this column, computed as (see notes).
What is the inertia (total inertia) of a table?
The sum of all the inertia of a column. Denoted as I. Note that this is equal to the sum of the squared singular values of the data table.
What is the center of gravity of the rows (centroid or barycenter)?
Denoted with g; the vector of the means of each column of X.
- When X is centered, its center of gravity is equal to the 1 x J row vector 0^T.
What are the four goals of a PCA?
- Extracting the most important information from the data table.
- Compressing the size of the dataset by keeping only this important information.
- Simplifying the description of the dataset.
- Analyzing the structure of the observations and the variables.
What are principal components?
They are linear combinations of the original variables.
What is the order of the principal components?
The first component is required to have the largest possible variance (i.e. inertia, and therefore this component will “explain” or “extract” the largest part of the inertia of the data table).
- The second component is computed under the constraint of being orthogonal to the first component and to have the largest possible inertia.
What are factor scores?
The values of the new variables for the observations.
- These factor scores can be interpreted geometrically as the projections of the observations onto the principal components.
How are components obtained in PCA?
From the singular value decomposition of the data table X.
- The IL matrix of factor scores, denoted F, is obtained as: F = P(delta)