PCA Literature Flashcards

1
Q

What does a PCA do?

A

It analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of a PCA?

A

Extracting the important information from the data table and expressing this information as a set of new orthogonal variables called “principle components”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are matrices, vectors and elements denoted?

A
  1. Matrices in upper case bold.
  2. Vectors in lower case bold.
  3. Elements in lower case italic.
    • Note: matrices, vectors, and elements from the same matrix all use the same letter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the PCA data table consist of?

A

I observations that are described by J variables. It is represented by the I x J matrix, whose generic element is x(ij).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a covariance PCA?

A

When each element of X is divided by sqrt(I) or sqrt(I-1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a correlation PCA?

A

When variables are standardized to a unit norm. This is done by dividing each variable by its norm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the singular value decomposition for the matrix X and what do the values mean?

A

X = P(delta)Q^T
- P is the IL of the matrix of left singular vectors.
- Q is the J
L of the matrix of right singular vectors.
- Delta is the diagonal matrix of singular values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the inertia of a column?

A

The sum of the squared elements of this column, computed as (see notes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the inertia (total inertia) of a table?

A

The sum of all the inertia of a column. Denoted as I. Note that this is equal to the sum of the squared singular values of the data table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the center of gravity of the rows (centroid or barycenter)?

A

Denoted with g; the vector of the means of each column of X.
- When X is centered, its center of gravity is equal to the 1 x J row vector 0^T.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the four goals of a PCA?

A
  1. Extracting the most important information from the data table.
  2. Compressing the size of the dataset by keeping only this important information.
  3. Simplifying the description of the dataset.
  4. Analyzing the structure of the observations and the variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are principal components?

A

They are linear combinations of the original variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the order of the principal components?

A

The first component is required to have the largest possible variance (i.e. inertia, and therefore this component will “explain” or “extract” the largest part of the inertia of the data table).
- The second component is computed under the constraint of being orthogonal to the first component and to have the largest possible inertia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are factor scores?

A

The values of the new variables for the observations.
- These factor scores can be interpreted geometrically as the projections of the observations onto the principal components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are components obtained in PCA?

A

From the singular value decomposition of the data table X.
- The IL matrix of factor scores, denoted F, is obtained as: F = P(delta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the matrix Q give?

A

It gives the coefficients of the linear combinations used to compute the factor scores.
- Can also be interpreted as a projection matrix because multiplying X by Q gives the values of the projections of the observations on the principal components.

17
Q

What is the bilinear decomposition of X?

A

When the matrix X is interpreted as the product of the factors score matrix by the loading matrix (Q) as: X = F*Q^T with F^TF = (delta)^2 and Q^TQ = I.

18
Q

What is a vector?

A

A one-dimensional set of elements of the same datatype (e.g. all numbers)

19
Q

What is a matrix?

A

A two-dimensional set of elements of the same datatype.

20
Q

What is an array?

A

An n-dimensional set of elements of the same datatype.

21
Q

What is a dataframe?

A

A two-dimensional set of elements that may consist of different datatypes.

22
Q

What is a list (in R)?

A

A one-dimensional set of elements which can hold different data types.