Week 9 Flashcards by Ryan Storey

Common metrics for dissimilarity for clustering

Euclidean distance
Squared euclidean distance
Manhattan distance
Cosine distance

How well did you know this?

Not at all

Perfectly

What is k-means clustering

Partitioning the input data into K>0 distinct clusters

How well did you know this?

Not at all

Perfectly

How does k-means work

Iterative algorithm, uses squared Euclidean distance and the centroid (average) as cluster representative.
Can only accurately model spherical clusters this way.

How well did you know this?

Not at all

Perfectly

is k-means good?

typically finds sub-optimal solutions as k-clustering problem is NP-hard

How well did you know this?

Not at all

Perfectly

What is hierarchical clustering

Generate a hierarchy of clusters with different resolutions
Computationally heavy
Agglomerative / Divisive

How well did you know this?

Not at all

Perfectly

How do we validate cluster quality?

Compactness (low variance) and separation (large spread)

How well did you know this?

Not at all

Perfectly

Week 9 Flashcards

(6 cards)