Lecture 5 - Dimensionality Reduction - Principal Component Analysis, Linear Discrimination Analysis, Singular Value Decomposition Flashcards

1
Q

What is meant by “Degrees of freedom”?

A

Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is dimensionality reduction?

A

Dimensionality reduction is the process of deriving a set of degrees of freedom which can be used to reproduce most of the variability of a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the goal of Dimensionality Reduction? And in broad terms how does it work?

A

Goal: To reduce dimensions by removing redundant and dependent features

How: By transforming features from higher dimensional space to a lower dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the different methods that can help us reduce dimensions?

A

Unsupervised where no need for labelling classes of data:

  • Independent Component Analysis (ICA)
  • Non-negative Matrix Factorization (NMF)
  • Principal Component Analysis (PCA)
    • Ideal for visualization and noise removal

Supervised where class labels are considered:

  • Mixture Discriminant Analysis (MDA)
  • Linear Discrimanant Analysis (LDA)
    • Ideal for biometrics, Bioinformatics and chemistry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Principal Component Analysis (PCA)?

A

PCA is

  • A popular technique for dimensionality reduction.
  • A “classical” approach that only characterize linear sub-spaces in data

Involves a dataset with observations on numerical variables

  • An exploratory data analysis tool
  • A simple, non-parametric method of extracting relevant information from data sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does PCA reduce dimensions?

A

PCA reduces dimensions by exposing underlying information in data sets

  • An unsupervised approach
  • Aims to explain most of the variability in data with a smaller number of variables
  • Identifies axis that accounts for the largest amount of variance in the training set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You should not use PCA if the data is…

A

showing some non-linearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three different types of PCA?

A
  1. Randomized PCA quickly finds an approximation of the first d principal components.
    • Issue: Whole training set need to fit in memory
  2. Incremental PCA (IPCA) splits the traning set into mini-batches and feed an IPCA algorithm one mini-batch at a time
  3. Kernel PCA helps perform complex nonlinear projections for dimensionality reduction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we calculate the PCA?

A

Primary PCA calculation steps:

  • Calculate covariance matrix
  • Calculate ordered eigenvalues and eigenvectors of the matrix
  • Compute principal components
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we calculate the Principal Components?

A

Overall PC calculation process:

  • For each PC:
    • PCA finds a zero-centered unit vector pointing in the direction of PC.
    • Direction of unit vectors returned by PCA is not stable
  • If you perturb training set slightly and run PCA again
    • Unit vectors may point in opposite direction as original vectors
      • Still, they will lie on same axes

(Don’t know how important it is to remember this)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the key characteristics of Linear Discriminant Analysis(LDA)?

A

Linear Discriminant analysis

  • Works as a pre-processing step
  • Is a supervised technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the different types of LDA?

A

Types to deal with classes: Class-dependent and class-independent

Class-dependent LDA: One separate lower dimensional space is calculated for each class to project its data on it

Class-independent LDA: Each class will be considered as a separate class against other classes.
- There is just one lower dimensional space for all classes to project their data on it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the steps of calculating LDA?

A

Goal: Project original data matrix onto a lower dimensional space.

Step 1: Between-class variance/matrix:
    - Calculate separability between different classes (i.e. the distance between the means of different classes).
Step 2: Within-class variance/matrix
    - Calculate distance between the mean and the samples of each class
Step 3: Construct lower dimensional space
    - By maximizing between-class variance and minimizing within-class variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the issues with LDA?

A

Issues:
Small Sample Problem (SSP): Fails to find lower dimensional space
- If dimensions > number of samples
- Here within-class matrix becomes singular

Linearity problem: Cannot discriminate between classes
- If different classes are non-linearly separable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the differences in how LDA works vs PCA?

A

PCA detects the directions of maximal variance

LDA finds subspace that maximizes class separability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Singular Value Decomposition (SVD)?

A

SVD is a method for transforming correlated variables into a set of uncorrelated variables
- To better expose various relationships among original data items

SVD is a method for identifying and ordering dimensions along which data points exhibit most variations

SVD can also be seen as a method for data reduction

17
Q

What are the basic steps of SVD?

A

Consider a high dimensional, highly variable set of data points

Reduce it to a lower dimensional space that exposes substructure of original data (more clearly)

Orders it from most variation to least

18
Q

True or False: SVD is fast even when number of features grows

A

False. SVD approach can get very slow when number of features grows

19
Q

True or False: SVD is fast even when number of samples grows

A

True. SVD can handle large training sets efficiently, provided they can fit in memory

20
Q

True or False: Training Linear regression model for large number of features is faster using Gradient Descent than using SVD

A

True

21
Q

What is variability in data?

A

Variability (or dispersion) is the extent to which a distribution is stretched or squeezed. Has multiple components: variance, standard deviation, interquartile range, and (I’ve also seen) range

22
Q

How does a PCA plot get plotted?

A

A PCA plot converts the correlations (or lack thereof) among all of the features into a 2D graph (or more dimensions, it depends)

Observations that are highly correlated cluster together

23
Q

How does PCA plot lines?

A

PCA finds the best fitting line by maximizing the sum of the squared distances from the projected points to the origin.

24
Q

How many PC should you use?

A

In a general n(observations) x p(variables) data matrix X, there are up to min(n-1, p) PCs

But there is no fixed method you should use

I think you should use the number of PCs that you consider adequate to contain the most variability of data

25
Q

How do you define PC 1 and PC 2

A

PC 1 the linear combination of features that has the highest variance

PC 2 is the linear combination of data sets that has the second-highest variance (it is not correlated to PC 2) and it is orthogonal (perpendicular on PC 1)