Module 2_ 5. Dimensionality reduction and Visualization Flashcards
Name to techniques to perform dimensionality reduction.
PCA and t-SNE
True or False.
By default, a vector is a row vector.
False
Given 2 vectors x1 = [2.2, 4.2] and x2 = [1.2, 3.2].
Calculate x1 + x2 and mean(x̄).
x1 + x2 = [3.4, 7.4]
x̄ = (1/n) Σ xi = (1/n) * (x1 + x2) = [1.7, 3.7]
Explain Feature/Column Normalization.
a1,a2,……..,an ——> n-values of fj
ai’ = (ai-amin)/(amax-amin) —-> ai’ ∈ [0, 1]
amin’ = (amin-amin)/(amax-amin) = 0
amax’ = (amax-amin)/(amax-amin) = 1
a1,a2,……..,an ——> col normalization ——-> a1’,a2’,……..,an’
such that ai’ ∈ [0, 1]
Basically we transform the data and move it into a unit square.
Explain Feature/Column Standardization.
a1,a2,……..,an ——> col standardization ——-> a1’,a2’,……..,an’
such that mean(ai’) = 0 and std-dev(ai’) = 1
Formula:
ai’ = (ai - ā)/δ where ā = mean(ai) and δ = std-dev(ai)
Column Standardization summarization:
1. Moving mean to origin
2. Squishing/expanding such that std-dev for any feature is 1.
Explain covariance of a data matrix.
How to convert 28 X 28 matrix of pixels to a vector?
28 X 28 matrix of pixels —–> ROW FLATTENING —–> Vector (784 X 1)
Explain PCA with 2 examples.
PCA stands for Principal Component Analysis (PCA).
We use PCA for dimensionality reduction.
Eg1.
f1 : blackness of hair
f2 : height
Spread of f2 is high and spread on f1 is minimal.
If we have to convert this to 1-D, we can consider sjipping f1 since the spread on f1 is minimal.
We should preserve data with maximal spread.
Eg2.
X —-> 2-D dataset. Say its column standardized
i.e. mean{f1} = mean{f2} = 0 and variance{f1} = variance{f2} = 1
Spread on both f1 and f2 is significant.
If we consider f1’ and f2’, spread(f2’) «_space;spread(f1’)
Also, f1’ is perpendicular to f2’
So basically we are rotating f1,f2 by an angle θ s.t. the variance of xi’s projected onto f1’ is maximal and then dropping f2’
Derive the mathematical objective function of PCA?
u1 : unit vector (same direction as f1’)
||u1|| = 1
X —> column standardized
xi’ = projection of xi on u1
= (u1.xi)/||u1||
= u1T.xi
x̄ī’ = u1T.x̄
Find u1 s.t. variance{projection of xi on u1} is maximal
var{u1T.xi} = (1/n) Σ ( u1T.xi - u1T.x̄ )^2
Since X is column standardized, x̄ = [0, 0, 0, …… , 0]
var{u1T.xi} = (1/n) Σ ( u1T.xi )^2
Objective function of PCA —-> max u1 (1/n) Σ ( u1T.xi )^2
s.t. u1Tu1 = 1 = ||u1||^2
Derive the alternative formulation of PCA : Distance minimization?
xi -> di : distance from xi to u1
min u1 Σ di^2
u1 : unit vector
u1Tu1 = 1 = ||u1||^2
di^2 = ||xi^2|| - ( u1T.xi )^2
= xiT.xi - ( u1T.xi )^2
Distance minimization PCA :
min u1 Σ ( xiT.xi - ( u1T.xi )^2 )
s.t. u1Tu1 = 1 = ||u1||^2
Explain Covariance matrix.
Covariance matrix is a square-symmetric matrix with diagonal elements that represent the variance and the non-diagonal components that express covariance.
Sij = Covariance(fi,fj) ; i : 1 -> d ; j : 1 -> d
Covariance(X,Y) = (1/n) Σ (xi - μx) * (yi - μy)
Covariance(X,X) = Variance(X) —-> (1)
Covariance(fi,fj) = Covariance(fj,fi) —-> (2)
Therefore,
Sij = Covariance(fi,fj) = Covariance(fj,fi) = Sji
What are the limitations of PCA?
- When λ1 ~ λ2, the information lost is very high.
eg. sine wave, circular distribution of points, etc. - PCA can be used for dimensionality reduction but cannot be used for visualization.