Lecture 10 - Clustering: K-Means Flashcards

Question 1

Q

What is the main goal of K-Means clustering?

Answer

A

To partition a dataset into 𝑘 clusters by minimizing the sum of squared distances (reconstruction error) between data points and their respective cluster centroids.

Question 2

Q

What are the steps in the K-Means algorithm?

Answer

A

Initialize 𝑘 centroids randomly.
Assign each data point to the nearest centroid.
Recalculate the centroids as the mean of all points in a cluster.
Repeat steps 2–3 until convergence (e.g., no changes in cluster assignments).

Question 3

Q

How is convergence determined in K-Means?

Answer

A

When the reconstruction error (sum of squared distances) stabilizes.

When the change in error between iterations is below a threshold.

When the maximum number of iterations is reached.

When cluster assignments stop changing.

Question 4

Q

What is the “elbow method” in K-Means?

Answer

A

A technique to find the optimal number of clusters 𝑘 by plotting the reconstruction error against 𝑘. The “elbow point” indicates diminishing returns in error reduction.

Question 5

Q

What is a major limitation of K-Means?

Answer

A

It assumes clusters are spherical and evenly sized, making it unsuitable for datasets with arbitrary cluster shapes or varying densities.

Question 6

Q

What is reconstruction error in K-Means?

Answer

A

The sum of squared distances between data points and their assigned cluster centroid.

Question 7

Q

What are the types of cluster representations in K-Means?

Answer

A

Hard clustering: Each point belongs to exactly one cluster.

Soft clustering: Points have a degree of belonging to multiple clusters.

Lecture 10 - Clustering: K-Means Flashcards

(7 cards)