Dimensionality reduction Flashcards

1
Q

Data preprocessing for PCA

A

Mean normalization and feature scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mean normalization

A

Calculate the mean of a data set and subtract it from each data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Feature scaling

A

Scale features to have a comparable range of values. (ie. divide by std. dev or range)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2D PCA vs. linear regression

A

1) linear regression tries to predict y based on x; PCA treats all features equally 2) linear regression minimizes squared vertical distance; PCA minimizes projected distance from line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Steps in PCA

A

1) Compute covariance matrix (Σ)
2) Compute Eigenvectors of Σ. First k columns of U provide our new basis vectors

3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Covariance matrix

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Compute Eigenvectors of Σ

(Octave code)

A

[U,S,V] = svd(Sigma);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Compute Σ

(Octave code)

A

Sigma = (1/m) * X’ * X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define new reduced basis vectors

(Octave code)

A

Ureduce = U(:,1:k);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Convert n-dim data x to k-dim data z

(Octave code)

A

z = Ureduce’ * x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

% of Variance Retained

A

Avg squared projection error / Total variation in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Determine k dim to retain

A

1) Run PCA
2) Pick smallest k, st. [99%] of variance retained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to avoid overfitting

A

Use regularization - not PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use PCA

A

Always try to run process on raw data first. Use PCA only if there is a good reason

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Good uses of PCA

A

Compression: Reduce memory / disk space needed; speed up learning

Visualization: Reduce to 2/3 dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly