Compressing Data via Dimensionality Reduction Flashcards
What is PCA?
Principal Component Analysis - for unsupervised data compression.
What is LDA?
Linear Discriminant Analysis - a supervised dimensionality reduction technique for maximizing class separability.
Describe PCA in a nutshell.
PCA aims to find the directions of maximum variance in high-dimensional data and project s it onto a new subspace with equal or fewer dimensions than the original one. The orthogonal axes (principal components) of the new subspace can be interpreted as the directions of maximum variance given the constraint that the new feature axes are orthogonal to each other.
What is PCA sensitive to?
data scaling. We need to standardize the features prior to PCA if the features were measured on different scales and we want to assign equal importance to all features.
Give a summary of the PCA steps.
1) Standardize the d-dimentional dataset.
2) Construct the covariance matrix.
3) Decompose the covariance matrix into its eigenvectors and eigenvalues.
4) Select k eigenvectors that correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace (k
Describe the function for covariance between two features xj and xk
= (1/n)*SUM(i) (xj^i - mean(xj) ) ( xk^i - mean(xk)) where mean(xj) & mean(xk) are the sample means of feature j and k. Note the sample means are zero if we standardize the dataset. A positive covariance between two features indicates that the features increase or decrease together, whereas a negative covariance indicates that the features vary in opposite directions.
What role do eigenvectors and eigenvalues play in PCA?
The eigenvectors of the covariance matrix represent the principal components (the directions of maximum variance), whereas the corresponding eigenvalues will define their magnitude.
What is the variance explained ratio of an eigenvalue?
The fraction of an eigenvalue and the total sum of the eigenvalues.
What is the general concept behind LDA?
The goal in LDS is to find the feature subspace that optimizes class separability.
What are assumptions in LDA?
That the data is normally distributed. Also, that the classes have identical covariance matrices and that the features are statistically independent of each other.
Summarize the key steps of the LDA approach.
1) Standardize the d-dimensional dataset (d is the number of features)
2) For each class, compute the d-dimensional mean vector.
3) Construct the between-class scatter matrix Sb and the within-class scatter matrix Sw..
4) Compute the eigenvectors and corresponding eigenvalues of the matrix Sw^-1*Sb
5) Choose the k eigenvectors that correspond to the k largest eigenvalues to construct a dxk-dimensional transformation matrix W; the eigenvectors are the columns of this matrix.
6) Project the samples onto the new feature subspace using the transformation matrix W.
What is a mean vector?
Each mean vector Mi stores the mean feature value Um with respect to the samples of class i: Mi = (1/n)*SUM(x) for that class.
Describe maximum likelihood
In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the “agreement” of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution.
What is the individual scatter matrices Si of each individual class i?
Si = Sum( (x-Mi)*transpose(x-Mi) ) from x in the set of class i
How do you compute the within-class scatter matrix Sw?
Sw = Sum i=1 -> c (Si) where Si is the scatter matrices of each individual class i.