Unsupervised Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

How to evaluate label predictions vs true labels using pandas?

A

pd.crosstab(df[‘preds’, ‘true’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is inertia?

A

How far points are from centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to choose best number of clusters?

A

When inertia stops dropping quickly. Elbow point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the problem with feature variance for kmeans?

A

Feature variance = feature influence so needs scaling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does StandardScaler do?

A

It standardizes features by removing the mean and scaling to unit variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Normalizer do?

A

It rescales each sample independently of the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the inkage method?

A

it defines how the distance between clusters is measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between single and complete linkages?

A

In complete linkage, the distance between clusters is the distance between the furthest points of the clusters. In single linkage, the distance between clusters is the distance between the closest points of the clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to extract hierarchical cluster labels at given height?

A

Using fcluster()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is t-SNE?

A

T-distribued stochastic Neighbour Embedding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are reasonable learning rate value for t-SNE?

A

50 to 200

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does PCA do?

A

PCA de-correlates the data by centering the mean to 0 and removing features with low variance (noisy) in order to keep informative features (high variance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is NMF?

A

Non-negative matrix factorization. Can only applied when values >= 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can NMF be used for text classification?

A

NMF features are topics and documents are combinations of topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between an outer and inner join?

A

An outer join is the union of indices while an inner join is the intersection of indices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly