10 Clustering Flashcards

1
Q

List the four iterative steps of standard K‑means.

A

(1) Randomly initialize k centroids. (2) Assign each point to its nearest centroid. (3) Recompute each centroid as the mean of its assigned points. (4) Repeat assignment + update until no assignments change or max iterations reached.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What objective does K‑means explicitly minimize?

A

The within‑cluster sum of squared Euclidean distances (total inertia).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is the global optimum of K‑means difficult to obtain?

A

Jointly searching for optimal point–centroid assignments and centroid locations is NP‑hard because the objective is non‑convex with many local minima.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define single‑linkage distance between two clusters.

A

Minimum pairwise distance between any member of cluster A and any member of cluster B (nearest‑neighbor criterion).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define complete‑linkage distance between two clusters.

A

Maximum pairwise distance between any member of cluster A and any member of cluster B (furthest‑neighbor criterion).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Ward’s linkage rule in agglomerative clustering?

A

Merge the pair of clusters whose union yields the smallest increase in total within‑cluster variance (error sum of squares).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give the high‑level pipeline of spectral clustering (normalized form).

A

(1) Build similarity graph W. (2) Form normalized Laplacian (L_{ ext{rw}}=I-D^{-1}W). (3) Compute first k eigenvectors. (4) Row‑normalize them. (5) Run K‑means on the rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What property links graph connectivity and Laplacian eigenvalues?

A

The number of connected components equals the multiplicity of eigenvalue 0 of L, (L_{ ext{rw}}), or (L_{ ext{sym}}).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the eigengap heuristic for choosing k.

A

Pick k where a large gap appears between (lambda_k) and (lambda_{k+1}); small first k eigenvalues and big gap suggest k well‑separated clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

State one key advantage of a k‑nearest‑neighbor graph for spectral clustering.

A

It adapts to local density and can connect points across varying scales without a global distance threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When prefer a mutual k‑nearest‑neighbor graph over standard k‑NN?

A

When you want to avoid connecting regions of very different density; an edge is kept only if both endpoints are among each other’s k nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a silhouette coefficient near +1, 0, and –1 indicate?

A

+1: point is well inside its cluster; 0: on the border; –1: likely mis‑clustered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

State the random‑walk view of the Normalized Cut objective.

A

( ext{Ncut}(A,ar A)=P(ar A|A)+P(A|ar A)); minimizing it finds a split where a random walk rarely crosses clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Give two reasons spectral clustering can beat plain K‑means on complex shapes.

A

(i) Uses graph connectivity rather than raw Euclidean geometry, handling non‑convex clusters. (ii) Eigenvector embedding separates intertwined structures before K‑means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True/False: The normalized Laplacians (L_{ ext{rw}}) and (L_{ ext{sym}}) are always positive semidefinite.

A

True — all their eigenvalues are non‑negative with the smallest equal to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly