Module 2_ 5. Dimensionality reduction and Visualization Flashcards

1
Q

Name to techniques to perform dimensionality reduction.

A

PCA and t-SNE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False.
By default, a vector is a row vector.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Given 2 vectors x1 = [2.2, 4.2] and x2 = [1.2, 3.2].
Calculate x1 + x2 and mean(x̄).

A

x1 + x2 = [3.4, 7.4]
x̄ = (1/n) Σ xi = (1/n) * (x1 + x2) = [1.7, 3.7]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Feature/Column Normalization.

A

a1,a2,……..,an ——> n-values of fj

ai’ = (ai-amin)/(amax-amin) —-> ai’ ∈ [0, 1]

amin’ = (amin-amin)/(amax-amin) = 0
amax’ = (amax-amin)/(amax-amin) = 1

a1,a2,……..,an ——> col normalization ——-> a1’,a2’,……..,an’
such that ai’ ∈ [0, 1]

Basically we transform the data and move it into a unit square.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain Feature/Column Standardization.

A

a1,a2,……..,an ——> col standardization ——-> a1’,a2’,……..,an’
such that mean(ai’) = 0 and std-dev(ai’) = 1

Formula:
ai’ = (ai - ā)/δ where ā = mean(ai) and δ = std-dev(ai)

Column Standardization summarization:
1. Moving mean to origin
2. Squishing/expanding such that std-dev for any feature is 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain covariance of a data matrix.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to convert 28 X 28 matrix of pixels to a vector?

A

28 X 28 matrix of pixels —–> ROW FLATTENING —–> Vector (784 X 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain PCA with 2 examples.

A

PCA stands for Principal Component Analysis (PCA).
We use PCA for dimensionality reduction.
Eg1.
f1 : blackness of hair
f2 : height
Spread of f2 is high and spread on f1 is minimal.
If we have to convert this to 1-D, we can consider sjipping f1 since the spread on f1 is minimal.
We should preserve data with maximal spread.
Eg2.
X —-> 2-D dataset. Say its column standardized
i.e. mean{f1} = mean{f2} = 0 and variance{f1} = variance{f2} = 1
Spread on both f1 and f2 is significant.
If we consider f1’ and f2’, spread(f2’) &laquo_space;spread(f1’)
Also, f1’ is perpendicular to f2’
So basically we are rotating f1,f2 by an angle θ s.t. the variance of xi’s projected onto f1’ is maximal and then dropping f2’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Derive the mathematical objective function of PCA?

A

u1 : unit vector (same direction as f1’)
||u1|| = 1
X —> column standardized
xi’ = projection of xi on u1
= (u1.xi)/||u1||
= u1T.xi
x̄ī’ = u1T.x̄

Find u1 s.t. variance{projection of xi on u1} is maximal
var{u1T.xi} = (1/n) Σ ( u1T.xi - u1T.x̄ )^2
Since X is column standardized, x̄ = [0, 0, 0, …… , 0]
var{u1T.xi} = (1/n) Σ ( u1T.xi )^2
Objective function of PCA —-> max u1 (1/n) Σ ( u1T.xi )^2
s.t. u1Tu1 = 1 = ||u1||^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Derive the alternative formulation of PCA : Distance minimization?

A

xi -> di : distance from xi to u1
min u1 Σ di^2
u1 : unit vector
u1Tu1 = 1 = ||u1||^2
di^2 = ||xi^2|| - ( u1T.xi )^2
= xiT.xi - ( u1T.xi )^2
Distance minimization PCA :
min u1 Σ ( xiT.xi - ( u1T.xi )^2 )
s.t. u1Tu1 = 1 = ||u1||^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain Covariance matrix.

A

Covariance matrix is a square-symmetric matrix with diagonal elements that represent the variance and the non-diagonal components that express covariance.
Sij = Covariance(fi,fj) ; i : 1 -> d ; j : 1 -> d
Covariance(X,Y) = (1/n) Σ (xi - μx) * (yi - μy)
Covariance(X,X) = Variance(X) —-> (1)
Covariance(fi,fj) = Covariance(fj,fi) —-> (2)
Therefore,
Sij = Covariance(fi,fj) = Covariance(fj,fi) = Sji

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the limitations of PCA?

A
  1. When λ1 ~ λ2, the information lost is very high.
    eg. sine wave, circular distribution of points, etc.
  2. PCA can be used for dimensionality reduction but cannot be used for visualization.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly