KMeans Flashcards

1
Q

Good clustering will produce clusters with:

  • [High / Low] intra-class similarity.
  • [High / Low] inter-class similarity.
A

High

Low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Other distance measures used in Clustering include:

A

Minkowski distance
Pearson correlation distance
Spearman correlation distance
Kendall correlation distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Challenges with k-Means Clustering

k-Means is very sensitive to the initial randomly chosen cluster centers (this is known as the ________)

A

random initialization trap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The _______ initialization approach mitigates the effects of the random initialization trap.

A

K-means++

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Methods for choosing the right K include:

A
Elbow Method
Information Criterion Approach
Silhouette method
Jump method
Gap statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WCSS stands for ________ and is associated with the _____ method for choosing K

A

Within Cluster Sum of Squares

Elbow Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Strengths of K-Means Clustering?

A
- Uses simple non-statistical
principles.
- Very flexible and malleable
algorithm.
- Wide set of real-world
applications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Weaknesses of K-Means Clustering?

A
  • Simplistic algorithm.
  • Relies on chance (initial k centroids)
  • Sometimes requires some domain knowledge in
    determining the ideal number of clusters.
  • Not ideal for non-spherical clusters.
  • Works with numeric data only.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly