Lecture 5 - Dimensionality Reduction - Principal Component Analysis, Linear Discrimination Analysis, Singular Value Decomposition Flashcards
What is meant by “Degrees of freedom”?
Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample
What is dimensionality reduction?
Dimensionality reduction is the process of deriving a set of degrees of freedom which can be used to reproduce most of the variability of a data set
What is the goal of Dimensionality Reduction? And in broad terms how does it work?
Goal: To reduce dimensions by removing redundant and dependent features
How: By transforming features from higher dimensional space to a lower dimensional space
What are the different methods that can help us reduce dimensions?
Unsupervised where no need for labelling classes of data:
- Independent Component Analysis (ICA)
- Non-negative Matrix Factorization (NMF)
- Principal Component Analysis (PCA)
- Ideal for visualization and noise removal
Supervised where class labels are considered:
- Mixture Discriminant Analysis (MDA)
- Linear Discrimanant Analysis (LDA)
- Ideal for biometrics, Bioinformatics and chemistry
What is Principal Component Analysis (PCA)?
PCA is
- A popular technique for dimensionality reduction.
- A “classical” approach that only characterize linear sub-spaces in data
Involves a dataset with observations on numerical variables
- An exploratory data analysis tool
- A simple, non-parametric method of extracting relevant information from data sets
How does PCA reduce dimensions?
PCA reduces dimensions by exposing underlying information in data sets
- An unsupervised approach
- Aims to explain most of the variability in data with a smaller number of variables
- Identifies axis that accounts for the largest amount of variance in the training set
You should not use PCA if the data is…
showing some non-linearity
What are the three different types of PCA?
- Randomized PCA quickly finds an approximation of the first d principal components.
- Issue: Whole training set need to fit in memory
- Incremental PCA (IPCA) splits the traning set into mini-batches and feed an IPCA algorithm one mini-batch at a time
- Kernel PCA helps perform complex nonlinear projections for dimensionality reduction
How can we calculate the PCA?
Primary PCA calculation steps:
- Calculate covariance matrix
- Calculate ordered eigenvalues and eigenvectors of the matrix
- Compute principal components
How do we calculate the Principal Components?
Overall PC calculation process:
- For each PC:
- PCA finds a zero-centered unit vector pointing in the direction of PC.
- Direction of unit vectors returned by PCA is not stable
- If you perturb training set slightly and run PCA again
- Unit vectors may point in opposite direction as original vectors
- Still, they will lie on same axes
- Unit vectors may point in opposite direction as original vectors
(Don’t know how important it is to remember this)
What are the key characteristics of Linear Discriminant Analysis(LDA)?
Linear Discriminant analysis
- Works as a pre-processing step
- Is a supervised technique
What are the different types of LDA?
Types to deal with classes: Class-dependent and class-independent
Class-dependent LDA: One separate lower dimensional space is calculated for each class to project its data on it
Class-independent LDA: Each class will be considered as a separate class against other classes. - There is just one lower dimensional space for all classes to project their data on it
What are the steps of calculating LDA?
Goal: Project original data matrix onto a lower dimensional space.
Step 1: Between-class variance/matrix: - Calculate separability between different classes (i.e. the distance between the means of different classes).
Step 2: Within-class variance/matrix - Calculate distance between the mean and the samples of each class
Step 3: Construct lower dimensional space - By maximizing between-class variance and minimizing within-class variance
What are the issues with LDA?
Issues:
Small Sample Problem (SSP): Fails to find lower dimensional space
- If dimensions > number of samples
- Here within-class matrix becomes singular
Linearity problem: Cannot discriminate between classes
- If different classes are non-linearly separable
What are the differences in how LDA works vs PCA?
PCA detects the directions of maximal variance
LDA finds subspace that maximizes class separability