PCA Literature Flashcards

Question 1

Q

What does a PCA do?

Answer

A

It analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated.

Question 2

Q

What is the goal of a PCA?

Answer

A

Extracting the important information from the data table and expressing this information as a set of new orthogonal variables called “principle components”.

Question 3

Q

How are matrices, vectors and elements denoted?

Answer

A

Matrices in upper case bold.
Vectors in lower case bold.
Elements in lower case italic.
- Note: matrices, vectors, and elements from the same matrix all use the same letter.

Question 4

Q

What does the PCA data table consist of?

Answer

A

I observations that are described by J variables. It is represented by the I x J matrix, whose generic element is x(ij).

Question 5

Q

What is a covariance PCA?

Answer

A

When each element of X is divided by sqrt(I) or sqrt(I-1).

Question 6

Q

What is a correlation PCA?

Answer

A

When variables are standardized to a unit norm. This is done by dividing each variable by its norm.

Question 7

Q

What is the singular value decomposition for the matrix X and what do the values mean?

Answer

A

X = P(delta)Q^T
- P is the IL of the matrix of left singular vectors.
- Q is the JL of the matrix of right singular vectors.
- Delta is the diagonal matrix of singular values.

Question 8

Q

What is the inertia of a column?

Answer

A

The sum of the squared elements of this column, computed as (see notes).

Question 9

Q

What is the inertia (total inertia) of a table?

Answer

A

The sum of all the inertia of a column. Denoted as I. Note that this is equal to the sum of the squared singular values of the data table.

Question 10

Q

What is the center of gravity of the rows (centroid or barycenter)?

Answer

A

Denoted with g; the vector of the means of each column of X.
- When X is centered, its center of gravity is equal to the 1 x J row vector 0^T.

Question 11

Q

What are the four goals of a PCA?

Answer

A

Extracting the most important information from the data table.
Compressing the size of the dataset by keeping only this important information.
Simplifying the description of the dataset.
Analyzing the structure of the observations and the variables.

Question 12

Q

What are principal components?

Answer

A

They are linear combinations of the original variables.

Question 13

Q

What is the order of the principal components?

Answer

A

The first component is required to have the largest possible variance (i.e. inertia, and therefore this component will “explain” or “extract” the largest part of the inertia of the data table).
- The second component is computed under the constraint of being orthogonal to the first component and to have the largest possible inertia.

Question 14

Q

What are factor scores?

Answer

A

The values of the new variables for the observations.
- These factor scores can be interpreted geometrically as the projections of the observations onto the principal components.

Question 15

Q

How are components obtained in PCA?

Answer

A

From the singular value decomposition of the data table X.
- The IL matrix of factor scores, denoted F, is obtained as: F = P(delta)

Question 16

Q

What does the matrix Q give?

Answer

A

It gives the coefficients of the linear combinations used to compute the factor scores.
- Can also be interpreted as a projection matrix because multiplying X by Q gives the values of the projections of the observations on the principal components.

Question 17

Q

What is the bilinear decomposition of X?

Answer

A

When the matrix X is interpreted as the product of the factors score matrix by the loading matrix (Q) as: X = F*Q^T with F^TF = (delta)^2 and Q^TQ = I.

Question 18

Q

What is a vector?

Answer

A

A one-dimensional set of elements of the same datatype (e.g. all numbers)

Question 19

Q

What is a matrix?

Answer

A

A two-dimensional set of elements of the same datatype.

Question 20

Q

What is an array?

Answer

A

An n-dimensional set of elements of the same datatype.

Question 21

Q

What is a dataframe?

Answer

A

A two-dimensional set of elements that may consist of different datatypes.

Question 22

Q

What is a list (in R)?

Answer

A

A one-dimensional set of elements which can hold different data types.

Brainscape's Knowledge GenomeTM

PCA Literature Flashcards

Brainscape's Knowledge Genome^TM