Week 9 Flashcards
Common metrics for dissimilarity for clustering
Euclidean distance
Squared euclidean distance
Manhattan distance
Cosine distance
What is k-means clustering
Partitioning the input data into K>0 distinct clusters
How does k-means work
Iterative algorithm, uses squared Euclidean distance and the centroid (average) as cluster representative.
Can only accurately model spherical clusters this way.
is k-means good?
typically finds sub-optimal solutions as k-clustering problem is NP-hard
What is hierarchical clustering
Generate a hierarchy of clusters with different resolutions
Computationally heavy
Agglomerative / Divisive
How do we validate cluster quality?
Compactness (low variance) and separation (large spread)