Lecture 5 - Dimensionality Reduction - Principal Component Analysis, Linear Discrimination Analysis, Singular Value Decomposition Flashcards by Simon Sardorf

What is meant by “Degrees of freedom”?

Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample

How well did you know this?

Not at all

Perfectly

What is dimensionality reduction?

Dimensionality reduction is the process of deriving a set of degrees of freedom which can be used to reproduce most of the variability of a data set

How well did you know this?

Not at all

Perfectly

What is the goal of Dimensionality Reduction? And in broad terms how does it work?

Goal: To reduce dimensions by removing redundant and dependent features

How: By transforming features from higher dimensional space to a lower dimensional space

How well did you know this?

Not at all

Perfectly

What are the different methods that can help us reduce dimensions?

Unsupervised where no need for labelling classes of data:

Independent Component Analysis (ICA)
Non-negative Matrix Factorization (NMF)
Principal Component Analysis (PCA)
- Ideal for visualization and noise removal

Supervised where class labels are considered:

Mixture Discriminant Analysis (MDA)
Linear Discrimanant Analysis (LDA)
- Ideal for biometrics, Bioinformatics and chemistry

How well did you know this?

Not at all

Perfectly

What is Principal Component Analysis (PCA)?

PCA is

A popular technique for dimensionality reduction.
A “classical” approach that only characterize linear sub-spaces in data

Involves a dataset with observations on numerical variables

An exploratory data analysis tool
A simple, non-parametric method of extracting relevant information from data sets

How well did you know this?

Not at all

Perfectly

How does PCA reduce dimensions?

PCA reduces dimensions by exposing underlying information in data sets

An unsupervised approach
Aims to explain most of the variability in data with a smaller number of variables
Identifies axis that accounts for the largest amount of variance in the training set

How well did you know this?

Not at all

Perfectly

You should not use PCA if the data is…

showing some non-linearity

How well did you know this?

Not at all

Perfectly

What are the three different types of PCA?

Randomized PCA quickly finds an approximation of the first d principal components.
- Issue: Whole training set need to fit in memory
Incremental PCA (IPCA) splits the traning set into mini-batches and feed an IPCA algorithm one mini-batch at a time
Kernel PCA helps perform complex nonlinear projections for dimensionality reduction

How well did you know this?

Not at all

Perfectly

How can we calculate the PCA?

Primary PCA calculation steps:

Calculate covariance matrix
Calculate ordered eigenvalues and eigenvectors of the matrix
Compute principal components

How well did you know this?

Not at all

Perfectly

How do we calculate the Principal Components?

Overall PC calculation process:

For each PC:
- PCA finds a zero-centered unit vector pointing in the direction of PC.
- Direction of unit vectors returned by PCA is not stable
If you perturb training set slightly and run PCA again
- Unit vectors may point in opposite direction as original vectors
  - Still, they will lie on same axes

(Don’t know how important it is to remember this)

How well did you know this?

Not at all

Perfectly

What are the key characteristics of Linear Discriminant Analysis(LDA)?

Linear Discriminant analysis

Works as a pre-processing step
Is a supervised technique

How well did you know this?

Not at all

Perfectly

What are the different types of LDA?

Types to deal with classes: Class-dependent and class-independent

Class-dependent LDA: One separate lower dimensional space is calculated for each class to project its data on it

Class-independent LDA: Each class will be considered as a separate class against other classes.
- There is just one lower dimensional space for all classes to project their data on it

How well did you know this?

Not at all

Perfectly

What are the steps of calculating LDA?

Goal: Project original data matrix onto a lower dimensional space.

Step 1: Between-class variance/matrix:
    - Calculate separability between different classes (i.e. the distance between the means of different classes).

Step 2: Within-class variance/matrix
    - Calculate distance between the mean and the samples of each class

Step 3: Construct lower dimensional space
    - By maximizing between-class variance and minimizing within-class variance

How well did you know this?

Not at all

Perfectly

What are the issues with LDA?

Issues:
Small Sample Problem (SSP): Fails to find lower dimensional space
- If dimensions > number of samples
- Here within-class matrix becomes singular

Linearity problem: Cannot discriminate between classes
- If different classes are non-linearly separable

How well did you know this?

Not at all

Perfectly

What are the differences in how LDA works vs PCA?

PCA detects the directions of maximal variance

LDA finds subspace that maximizes class separability

How well did you know this?

Not at all

Perfectly

What is Singular Value Decomposition (SVD)?

Study These Flashcards

SVD is a method for transforming correlated variables into a set of uncorrelated variables
- To better expose various relationships among original data items

SVD is a method for identifying and ordering dimensions along which data points exhibit most variations

SVD can also be seen as a method for data reduction

What are the basic steps of SVD?

Study These Flashcards

Consider a high dimensional, highly variable set of data points

Reduce it to a lower dimensional space that exposes substructure of original data (more clearly)

Orders it from most variation to least

True or False: SVD is fast even when number of features grows

Study These Flashcards

False. SVD approach can get very slow when number of features grows

True or False: SVD is fast even when number of samples grows

Study These Flashcards

True. SVD can handle large training sets efficiently, provided they can fit in memory

True or False: Training Linear regression model for large number of features is faster using Gradient Descent than using SVD

Study These Flashcards

True

What is variability in data?

Study These Flashcards

Variability (or dispersion) is the extent to which a distribution is stretched or squeezed. Has multiple components: variance, standard deviation, interquartile range, and (I’ve also seen) range

How does a PCA plot get plotted?

Study These Flashcards

A PCA plot converts the correlations (or lack thereof) among all of the features into a 2D graph (or more dimensions, it depends)

Observations that are highly correlated cluster together

How does PCA plot lines?

Study These Flashcards

PCA finds the best fitting line by maximizing the sum of the squared distances from the projected points to the origin.

How many PC should you use?

Study These Flashcards

In a general n(observations) x p(variables) data matrix X, there are up to min(n-1, p) PCs

But there is no fixed method you should use

I think you should use the number of PCs that you consider adequate to contain the most variability of data

How do you define PC 1 and PC 2

PC 1 the linear combination of features that has the highest variance PC 2 is the linear combination of data sets that has the second-highest variance (it is not correlated to PC 2) and it is orthogonal (perpendicular on PC 1)

Lecture 5 - Dimensionality Reduction - Principal Component Analysis, Linear Discrimination Analysis, Singular Value Decomposition Flashcards

(25 cards)