PCA Flashcards

1
Q

What is the basic idea behind dimensionality reduction or PCA?

A

Replace a large number of predictors with a smaller number, which maintain a good representation of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is reducing dimensionality important?

A

When an ML model runs into memory or long processing time issues due to big data with too many predictors.

Another use is simply to be able to visualize the data in 2 or 3 dimensions. E.g., scatterplot with labels allows to see which data points (like customers) are close to each other in space, even if the new/reduced axes of that space aren’t meaningful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dimension reduction / PCA: After reducing the data to, say, 4 dimensions, what do the axes represent?

A

The axes are rotated to represent the highest directions of variance in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

PCA: main syntax for creating it, fitting, and transforming? (3 lines)

  • What are the arguments for the model?
A

from sklearn.decomposition import PCA

my_pca = PCA(n_components)
my_pca.fit(X) # methods/params become available after this step
X_trans = my_pca.transform(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to run a PCA transform with however many dimensions capture at least 90% of the original features’ variance?

A

PCA(n_components=.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

After running my_pca.fit(X), what attributes become available?

A

.components_ # principal axes’ vectors

.explained_variance_ratio # each axis’ relative contribution to explaining the original variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is not a good reason to use dimensionality reduction / PCA?

A

When you have features that are redundant or collinear (correlated with each other). Including all of them in a model as is might result in overfitting, but simply doing PCA on them first won’t help this issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is PCA a predictor?

A

No, it is a transformer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens if you run PCA without specifying n_components?

A

It keeps n_components equal to X’s N features, but still rotates the data to set the N axes along the data’s max variability. So X doesn’t get reduced, but it still gets transformed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

You did PCA then KMeans clustering and found the cluster centers in the reduced/transformed space. How do you get the centroids’ coordinates in the original feature space? In other words, how do you “reverse” PCA?

A

my_pca.inverse_transform(centroids)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCA: How do you decide how many dimensions?

A

Usually you want to capture a min of X% (e.g., 90%) of the variance. However many N components get you that is how many you settle on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly