10 Clustering Flashcards

Question 1

Q

List the four iterative steps of standard K‑means.

Answer

A

(1) Randomly initialize k centroids. (2) Assign each point to its nearest centroid. (3) Recompute each centroid as the mean of its assigned points. (4) Repeat assignment + update until no assignments change or max iterations reached.

Question 2

Q

What objective does K‑means explicitly minimize?

Answer

A

The within‑cluster sum of squared Euclidean distances (total inertia).

Question 3

Q

Why is the global optimum of K‑means difficult to obtain?

Answer

A

Jointly searching for optimal point–centroid assignments and centroid locations is NP‑hard because the objective is non‑convex with many local minima.

Question 4

Q

Define single‑linkage distance between two clusters.

Answer

A

Minimum pairwise distance between any member of cluster A and any member of cluster B (nearest‑neighbor criterion).

Question 5

Q

Define complete‑linkage distance between two clusters.

Answer

A

Maximum pairwise distance between any member of cluster A and any member of cluster B (furthest‑neighbor criterion).

Question 6

Q

What is Ward’s linkage rule in agglomerative clustering?

Answer

A

Merge the pair of clusters whose union yields the smallest increase in total within‑cluster variance (error sum of squares).

Question 7

Q

Give the high‑level pipeline of spectral clustering (normalized form).

Answer

A

(1) Build similarity graph W. (2) Form normalized Laplacian (L_{ ext{rw}}=I-D^{-1}W). (3) Compute first k eigenvectors. (4) Row‑normalize them. (5) Run K‑means on the rows.

Question 8

Q

What property links graph connectivity and Laplacian eigenvalues?

Answer

A

The number of connected components equals the multiplicity of eigenvalue 0 of L, (L_{ ext{rw}}), or (L_{ ext{sym}}).

Question 9

Q

Explain the eigengap heuristic for choosing k.

Answer

A

Pick k where a large gap appears between (lambda_k) and (lambda_{k+1}); small first k eigenvalues and big gap suggest k well‑separated clusters.

Question 10

Q

State one key advantage of a k‑nearest‑neighbor graph for spectral clustering.

Answer

A

It adapts to local density and can connect points across varying scales without a global distance threshold.

Question 11

Q

When prefer a mutual k‑nearest‑neighbor graph over standard k‑NN?

Answer

A

When you want to avoid connecting regions of very different density; an edge is kept only if both endpoints are among each other’s k nearest neighbors.

Question 12

Q

What does a silhouette coefficient near +1, 0, and –1 indicate?

Answer

A

+1: point is well inside its cluster; 0: on the border; –1: likely mis‑clustered.

Question 13

Q

State the random‑walk view of the Normalized Cut objective.

Answer

A

( ext{Ncut}(A,ar A)=P(ar A|A)+P(A|ar A)); minimizing it finds a split where a random walk rarely crosses clusters.

Question 14

Q

Give two reasons spectral clustering can beat plain K‑means on complex shapes.

Answer

A

(i) Uses graph connectivity rather than raw Euclidean geometry, handling non‑convex clusters. (ii) Eigenvector embedding separates intertwined structures before K‑means.

Question 15

Q

True/False: The normalized Laplacians (L_{ ext{rw}}) and (L_{ ext{sym}}) are always positive semidefinite.

Answer

A

True — all their eigenvalues are non‑negative with the smallest equal to 0.

10 Clustering Flashcards

(15 cards)