Session 6.3 Flashcards
K-means clustering
- Popular method
- Centroid-based
- Find prototype data point for each cluster
K-means clustering
Prototype-based clustering
• Each cluster represented by a prototype
• Other names: centroid clustering, centre-based clustering
• Example: customer segmentation. Each segment has a prototype customer and
customers similar to him/her are associated with that cluster.
Centroid - real/imaginary data point with mean characteristics of all the data points
within the cluster
K-means clustering - how to?
- Select proximity measure and specify the number of clusters (k).
- Initiate the process by selecting centroids.
- Assign the data points to the “nearest” centroid to form a cluster.
- Calculate the new centroid.
- Iterate over steps 3 and 4 until the stopping criteria are fulfilled.
Why use k-means?
Strengths:
- Simple
- Efficient
K-means weaknesses
- the value of k - how to determine it?
- converges to locally optimal solution
- globular/spherical clusters
Hierarchical clustering
- Creates a collection of ways to group the points.
* Output: a hierarchy of potential clusterings - dendrogram.
Why use hierarchical clustering?
Strengths:
• Clusters can be of any size and shape.
• Does not require to prespecify the number of clusters.
Hierarchical clustering weaknesses
- Still need to decide where to split.
* Computationally inefficient.