Same construct as variance but when we have two variables Includes variance Not scaled Scale dependant

11 | DW-1 | PCA Flashcards by Stevie Davies

(QUIZ 6)
PCA was invented by ______ in ______.

Karl Pearson, 1901

How well did you know this?

Not at all

Perfectly

(QUIZ 4)
The goal of PCA is to replace a ______ number of ______variables with a ______ number of ______ variables while capturing as much information in the ______ variables as possible. Principal components are ______ combinations of the ______ variables. PC1 is the ______ combination of the k observed variables that accounts for most of the variance in the original set of variables. PC2 is ______to PC1.

The goal of PCA is to replace a large number of correlated variables with a small number of ______ variables while capturing as much information in the original variables as possible. Principal components are linear combinations of the observed variables. PC1 is the weighted combination of the k observed variables that accounts for most of the variance in the original set of variables. PC2 is orthogonal to PC1.

How well did you know this?

Not at all

Perfectly

(QUIZ 4)
The amount of variance kept in the PC’s can be visualized in a ______. To decide how many components to investigate we usually look at the ______variance. One threshold is often the 90% limit. We use as many components until we reach this limit. The positions of the samples in this new coordinates system are visualized in a so called ______. If we would like to show both the positions of the samples and the ______ of the ______variables we can use a ______.

The amount of variance kept in the PC’s can be visualized in a screeplot. To decide how many components to investigate we usually look at the cumulative variance. One threshold is often the 90% limit. We use as many components until we reach this limit. The positions of the samples in this new coordinates system are visualized in a so called scoreplot. If we would like to show both the positions of the samples and the correlations of the original variables we can use a biplot.

How well did you know this?

Not at all

Perfectly

(QUIZ 4)

Please interpret the results: 
> data(iris) 
> pca=prcomp(iris[,1:4]) 
> summary(pca) 
Importance of components: 
                                           PC1      PC2      PC3      PC4 
Standard deviation        2.0563  0.49262  0.2797  0.15439 
Proportion of Variance  0.9246  0.05307  0.0171  0.00521 
Cumulative Proportion  0.9246  0.97769  0.9948  1.00000 
> pca$rotation 
                                    PC1              PC2              PC3              PC4 
Sepal.Length   0.36138659   -0.65658877    0.58202985    0.3154872 
Sepal.Width    -0.08452251   -0.73016143   -0.59791083   -0.3197231 
Petal.Length    0.85667061   0.17337266    -0.07623608    -0.4798390 
Petal.Width      0.35828920   0.07548102    -0.54583143    0.7536574

The major variance is in the first ______component(s). The variance contributing most to the first component is ______, the second component is the ______component.
The third component is ______ as it contributes ______to the total variance.

The major variance is in the first one component(s). The variance contributing most to the first component is Petal.Length, the second component is the Sepal component.
The third component is not important as it contributes less (Detlef said more) than 5% to the total variance.

How well did you know this?

Not at all

Perfectly

(QUIZ 4)
PCA is a ______ projection method. PCA will fail if ______ data are to be processed. In that case, ______ may be the better choice. To determine deviation from Gaussianity, ______can be applied. An advantageous property is its application of a ______distance metric.

PCA is a linear projection method. PCA will fail if non Gaussian data are to be processed. In that case, ICA may be the better choice. To determine deviation from Gaussianity, Kurtosis can be applied. An advantageous property is its application of a density aware distance metric.

How well did you know this?

Not at all

Perfectly

Motivation for PCA / dimensionality reduction

How do the different samples group together?
Which molecules (genes, metabolites,….variables) are important with regard to sample separation, whichones are noise only?
Which molecules show a correlated behavior and can thus be treated as one?
Is there a way to „view“ the data in a meaningful way?

How well did you know this?

Not at all

Perfectly

PCA invented by/when?
* Karl Pearson in 1901

What assumption for finding the principal component?
* Direction of greatest variance (σ²) captures most relevant info about system
* It is said to be the First Principal Component

How well did you know this?

Not at all

Perfectly

Signal to Noise Ratio (SNR)

SNR= s²_signal / s²_noise

How well did you know this?

Not at all

Perfectly

PCA Qualitatively – first step?

Find the centroid (mean along all coordinates)= origin of the new basis

How well did you know this?

Not at all

Perfectly

PCA Qualitatively – second step?

1. Find direction d along which variance is maximal

How well did you know this?

Not at all

Perfectly

PCA Qualitatively – third step?

1. Find direction of greatest variance in plane that is perpendicular to d

How well did you know this?

Not at all

Perfectly

PCA Qualitatively – fourth step?

1. Repeat n-times, where n is the number of original dimensions (here 3). (Last vector is determined by orthogonality criterion).

How well did you know this?

Not at all

Perfectly

PCA Qualitatively - How to express coordinate of every point using new basis

New coordinates correspond to projections of the old coordinates onto the PCs, which is equivalent to a rotation of the old coordinate set around the centroid such that the directions of the old and new base vectors line up.

How well did you know this?

Not at all

Perfectly

What is a score plot?

Score Plot = plot of data points in new coordinate system

How well did you know this?

Not at all

Perfectly

Correlation between variables is _______

Redundancy. We don’t need both variables to know the position, just one

How well did you know this?

Not at all

Perfectly

Covariance is

Same construct as variance but when we have two variables
Includes variance
Not scaled
Scale dependant

Variance is

Correlation is

Normalised covariance = the covariance scaled by the variance
Takes on the propery of being between -1 and 1

Relation of cov and cor

Both measures capture redundancy
Correlation is what you would get if you standardise your original observations (Z-transformation = standardisation) and measuring the covariance

Why the n-1 in the nominator of var / cov

How do we decide if dimensions are redundant?

If covariance / correlation is high
If we have dispersion between two variables , we need both dimensions

The covariance matrix ?

The complete Covariance Matrix in 3D, C:

Blue: Variances of x,y,z
orangeL Pairwise covariances

See SPICK

How do we use the covariance matrix to produce a more formal formulation of PCA?

We want to find a transformation, P, of the original coordinates and thus a new coordinate system for which:
C - see spick
(diagonal matrix)
Cov(X’, Y’) = Cor(X’, Y’) = 0 → redundancy is removed!
Var(X’) >= 0, Var(Y’) >= 0

What do we know from linear algebra about a symmetrical matrix A and a matrix of eigenvectors of A?

A = EDE^T
A – any symmetric matrix
E – matrix of eigenvectors of A
D – diagonal matrix (all off-diagonal elements are zero!)
E^T – transpose of E (rows and columns swapped)

Eigenvector, eigenvalue problem

* Important equation in math and physics! * In words: „matrix times a vector equals a scalar times this vector“ * C(V) E_i = λ_i E_i * has at most min(m,n-1) meaningful (λ_i >0) solutions; * λ_i are eigenvalues associated with E_i * http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf

λ are ______ along PCS. This means that …….

* Variances * Eigenvector with largest eigenvalue (variance) explains most of the total variance (=signal)

Components of eigenvectors are the (unit-scaled) ______

* Loadings * i.e. contributions of original variables to PC

length of eigenvectors=

* 1

PCs/eigenvectors are ______ _________ of the original variables

* Linear combinations

What is explained variance?

* = how much of the total variance is captured by a particular PC

How many PCs to consider?

* V_T=Σjλj = total variance

Explained variance by PC_i

* = λ_i /V_T * 100% * If based on correlation matrix, λ>1, are significant PCs („Kaiser-Harris criterion“) or λ greater than λ for randomized data („parallel analysis“)

Dimensionality reduction:

* It is sufficient to consider PC1 coordinates only; i.e. projection of original points onto PC1

General Goal of PCA:

* Replace a large number of correlated variables with a smaller number of uncorrelated variables while capturing as much information in the original variables as possible.

Scree plot?

* Plot showing how much of original data is explained by the different PCs * A Scree Plot is a simple line segment plot that shows the eigenvalues for each individual PC

R Stuff Where are loadings stored?

In pca_name$rotations

R: We have 300 genes and 80 samples m=matrix(nrow=300,ncol=80) >p=prcomp(m) # yields _____ in coordinates of _____

>p=prcomp(m) # yields genes in coordinates of samples

R: We have 300 genes and 80 samples m=matrix(nrow=300,ncol=80) >p=prcomp(t(m)) # yields _____ in coordinates of _____

>p=prcomp(t(m)) # yields samples in coordinates of genes