High-Dimensionality Reduction (PCA) Flashcards

1
Q

Principal component analysis (PCA) converts …

Explain the principal components.
The principal components are ..

Short note
* On expending the determinant |A – lambaIdentity Matrix|, we get a polynomial in lambda.
* This polynomial is called the characteristic polynomial of A.
* The equation |A – lamba
Identity Matrix| = 0 is called the characteristic equation of A.

A

a set of possibly correlated variables into a (possibly smaller) set of values of linearly uncorrelated variables called principal components.

orthogonal (they are the Eigenvectors of the symmetric covariance matrix).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Principal components are the… They show the direction of maximum variance.
The Eigenvalues λ explain the …
So PCA simply takes points expressed in the standard basis and…

A

Eigenvectors of the covariance or correlation matrix

amount of variance along that axis, and the proportion of the overall variance explained by the PC.

transforms them into points expressed in an eigenvector basis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Correlation and covariance matrices are..

A covariance matrix implicitly involves…If variables are measured in different units use the

Eigenvectors are the …

A

all positive semi-definite matrices. Thus its eigenvalues are always positive or null.

centering of the data already.; correlation matrix:

axes! Rotation is equivalent to a basis transformation by an orthonormal basis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Now, if you like, you can decide to ignore the components of lesser significance.
You do lose some information, but if the Eigenvalues are small, you don’t lose much:
* …
* calculate 𝑝 Eigenvectors and Eigenvalues
* choose only the first 𝑘 < 𝑝 Eigenvectors
* final data set has only 𝑘 dimensions

Matrix 𝚽 also allows aggregating similar attributes.
Each element of the Eigenvectors represents the contribution of a given variable to a component.
* In the example below the attributes volume, length, width, and depth all have
high impact on the first component. They could as well be considered representations of a …
* Similarly, determine latent variables such as „social status“ in customer data.

A

𝑝 dimensions in your data

latent variable “size”, while speed1 and speed2 could be aggregated to “speed”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Assumptions of PCA
PCA assumes…
* cloud of points in 𝑝-dimensional space has linear dimensions that can be …
* If the structure in the data is NONLINEAR, the …
PCA uses the …. With discrete variables special techniques are in order (e.g., correspondence analysis).

A

relationships among variables are LINEAR.

effectively summarized by the principal axes.

principal axes will not be an efficient and informative summary of the data.

Euclidean distance among points assuming continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SVD

A = U S V^t

Singular values are related to the eigenvalues of the
covariance matrix via

How to compute SVD?

A

Singular values of the SVD decomposition of the matrix 𝑨 are the square roots of
the Eigenvalues of the matrix (𝑨𝑨𝒕) or (𝑨𝒕𝑨). Sorted by size.

The columns of V are the principal axes, while the columns of US are principal
component scores of the centered matrix 𝑿 in PCA.

eigenvalue i = singular value i ^2 / p-1

  • 𝑼 and 𝑽 are orthogonal matrices, such that 𝑽^𝑻 = 𝑽^−𝟏 and 𝑼^𝑻 = 𝑼^−𝟏
  • Finding 𝑼, 𝑺, 𝑽 requires finding Eigenvectors of 𝑨𝒕𝑨.
  • The corresponding Eigenvectors are found by using these values of lambda in the equation 𝐀 – 𝜆𝑰 𝑣 = 0, providing 𝑽.
  • 𝑺 has the square roots of the Eigenvalues 𝜆 in the diagonals.
  • Knowing that 𝑨𝑽 = 𝑼𝑺, one can derive 𝑼.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

c) A colleague plans to run a linear regression model using the three features x1, x2, and x3. Discuss what problem occurs and how this problem can be mitigated.
d) Now another datapoint (4, 2.6, 1.2) is added to the dataset. Do the eigenvalues, the principal com- ponents, and the ratio of explained variance of each component change? Discuss your reasoning.

A

Issue: Perfect multicollinearity (or since eigenvalue = 0, it means one feature has no explanatory power). GMP violated –> perform regression using only 2 prinicipal components or remove feature. Another issue is overfitting since #features almost as large as #observations.

The way to approach this problem is to first CENTER THE DATAPOINT using the previous centres. The obtained vector should be tested for whether it is a scaled factor of the principal vector (eigenvector) calculated. If it is==> the eigenvector stays same, the new eigenvalue is calculated and changing the ratio of explained variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

c) Now another point (a, b) is added to the dataset. As a results, the principal components do not change. Determine a possible point (a, b).

d) We now adjust the original dataset in the following three ways
1) Multiply both coordinates of the first point by some factor k ∈ R.

2) Multiply all coordinates of all points by the same factor k ∈ R.

3) Flip all coordinates of all points, e.g. (1, −4) → (−4, 1).
In each case, will the eigenvalues and/or principal components change? Discuss your reasoning.

A

c) The principal components will not change if the point (a, b) is perfectly explained by the first principal component. We can thus pick any point along the direction of the first eigenvector. For example, any of the projected points from b) satisfy this requirement, e.g. (a, b) = (1.40, −3.73)

1) This will change both eigenvalues and eigenvectors. Scaling the coordinates of just one point changes the zero-mean and covariance matrix, resulting in different eigenvalues and eigenvectors.

2) The eigenvalues will change, but the eigenvectors will be identical. Multiplying each coordinate by some k will result in a covariance matrix. (see slides)
Standardizing these vectors will lead to the same eigenvectors v1 and v2 regardless of the chosen k.

3) “Flipping” coordinates is equivalent to an isometric affine transformation / line reflection. The “structure” of the data thus does not change. As a result, the eigenvalues are identical, but the eigenvectors will also undergo the reflection, i.e. the new principal components will be reversed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly