Dimensionality reduction Flashcards

Question 1

Q

Data preprocessing for PCA

Answer

A

Mean normalization and feature scaling

Question 2

Q

Mean normalization

Answer

A

Calculate the mean of a data set and subtract it from each data point

Question 3

Q

Feature scaling

Answer

A

Scale features to have a comparable range of values. (ie. divide by std. dev or range)

Question 4

Q

2D PCA vs. linear regression

Answer

A

1) linear regression tries to predict y based on x; PCA treats all features equally 2) linear regression minimizes squared vertical distance; PCA minimizes projected distance from line

Question 5

Q

Steps in PCA

Answer

A

1) Compute covariance matrix (Σ)
2) Compute Eigenvectors of Σ. First k columns of U provide our new basis vectors

3)

Question 6

Q

Covariance matrix

Question 7

Q

Compute Eigenvectors of Σ

(Octave code)

Answer

A

[U,S,V] = svd(Sigma);

Question 8

Q

Compute Σ

(Octave code)

Answer

A

Sigma = (1/m) * X’ * X

Question 9

Q

Define new reduced basis vectors

(Octave code)

Answer

A

Ureduce = U(:,1:k);

Question 10

Q

Convert n-dim data x to k-dim data z

(Octave code)

Answer

A

z = Ureduce’ * x

Question 11

Q

% of Variance Retained

Answer

A

Avg squared projection error / Total variation in the data

Question 12

Q

Determine k dim to retain

Answer

A

1) Run PCA
2) Pick smallest k, st. [99%] of variance retained

Question 13

Q

How to avoid overfitting

Answer

A

Use regularization - not PCA

Question 14

Q

When to use PCA

Answer

A

Always try to run process on raw data first. Use PCA only if there is a good reason

Question 15

Q

Good uses of PCA

Answer

A

Compression: Reduce memory / disk space needed; speed up learning

Visualization: Reduce to 2/3 dimensions

Dimensionality reduction Flashcards

(15 cards)