KMeans Flashcards
Good clustering will produce clusters with:
- [High / Low] intra-class similarity.
- [High / Low] inter-class similarity.
High
Low
Other distance measures used in Clustering include:
Minkowski distance
Pearson correlation distance
Spearman correlation distance
Kendall correlation distance.
Challenges with k-Means Clustering
k-Means is very sensitive to the initial randomly chosen cluster centers (this is known as the ________)
random initialization trap
The _______ initialization approach mitigates the effects of the random initialization trap.
K-means++
Methods for choosing the right K include:
Elbow Method Information Criterion Approach Silhouette method Jump method Gap statistic
WCSS stands for ________ and is associated with the _____ method for choosing K
Within Cluster Sum of Squares
Elbow Method
Strengths of K-Means Clustering?
- Uses simple non-statistical principles. - Very flexible and malleable algorithm. - Wide set of real-world applications.
Weaknesses of K-Means Clustering?
- Simplistic algorithm.
- Relies on chance (initial k centroids)
- Sometimes requires some domain knowledge in
determining the ideal number of clusters. - Not ideal for non-spherical clusters.
- Works with numeric data only.