Lecture 10 - Clustering: K-Means Flashcards
What is the main goal of K-Means clustering?
To partition a dataset into π clusters by minimizing the sum of squared distances (reconstruction error) between data points and their respective cluster centroids.
What are the steps in the K-Means algorithm?
- Initialize π centroids randomly.
- Assign each data point to the nearest centroid.
- Recalculate the centroids as the mean of all points in a cluster.
- Repeat steps 2β3 until convergence (e.g., no changes in cluster assignments).
How is convergence determined in K-Means?
When the reconstruction error (sum of squared distances) stabilizes.
When the change in error between iterations is below a threshold.
When the maximum number of iterations is reached.
When cluster assignments stop changing.
What is the βelbow methodβ in K-Means?
A technique to find the optimal number of clusters π by plotting the reconstruction error against π. The βelbow pointβ indicates diminishing returns in error reduction.
What is a major limitation of K-Means?
It assumes clusters are spherical and evenly sized, making it unsuitable for datasets with arbitrary cluster shapes or varying densities.
What is reconstruction error in K-Means?
The sum of squared distances between data points and their assigned cluster centroid.
What are the types of cluster representations in K-Means?
Hard clustering: Each point belongs to exactly one cluster.
Soft clustering: Points have a degree of belonging to multiple clusters.