Clustering Flashcards
On a basic level how is Clustering different from Classification?
Clustering does not have pre-defined class labels.
Each cluster is a set of examples with similar attribute values. There is no special class attribute.
What are the two clustering algorithms that were covered in the lecture?
K-means and Clustering based on minimum spanning trees.
What are agglomerative algorithms?
Bottom up tree of clusters
How does the K-means algorithm work?
Choose at random, K points to be initial centroids of K clusters.
Repeat the previous step multiple times by assigning each example to the cluster with the nearest centroid.
Until no example changed cluster.
What is the Repeat-Until loop?
It is the K-mean algorithms process of clustering
What are the limitations of K-means?
- Tends to only discover clusters of hyper-spherical shape
- Sensitive to outliers
- Requires predefined number of clusters (often not natural)
- Different initial centroids can lead to different clustering results (so it is recommended to run K-means MANY times with different set of initial centroids at each run to see if all runs lead to similar clustering results)
- Uses a simple iterative approach that may find a “local” rather than “global” minimum (this is done to minimise the total summation of distances between each cluster centroid and its associated examples (overall clusters))
What is a minimum spanning tree?
A tree with minimum weight, out of all spanning trees.
What are the steps in a Partitional Clustering which is based on minimum spanning trees>
1) Construct the Minimum Spanning Tree (MST) for the data
2) Identify “inconsistent” edges in the MST
3) Remove inconsistent edges and consider each of the connected components as a cluster
How is a an edge inconsistent in Partitional Clustering based on MST?
If its weight (the distance between its two end nodes) is significantly larger than the average weight of the nearby edges.
What does K-means attempt to do?
Tries to minimise the total summation of distance between a clusters’ centroid and its associated examples, over all clusters.
What does Graph-Based Clustering try to do?
First, construct a minimum spanning tree, connecting all the examples (nodes on the graph), then remove some edges from that MST to create separate clusters.
What are the 3 Agglomerative Clustering methods of computing the distance between two closters?
Single-link agglomerative clustering
Complete-link agglomerative clustering
Average-link agglomerative clustering
What is single-link agglomerative clustering?
The distance between two clusters is the distance between the NEAREST pair of examples where each objects belongs to a different cluster.
What is complete-link agglomerative clustering?
The distance between two clusters is the distance between the MOST distant pair of examples where each object belongs to a different cluster.
What is average-link agglomerative clustering?
Average distance between all pairs of clusters
What are the limitations of hierarchical clustering?
The decision to merge or split two clusters is greedy. (this cannot be undone later during construction of the dendrogram)
Computationally expensive
After creating the dendrogram we still have to decide where to cut it, in order to produce a set of clusters.
What are the challenges in evaluating the quality of clustering solutions?
No golden truth, objective measure of accuracy.
We can minimise the total distance between examples within each cluster but we can also trivially minimise it by assigning each example to a distinct cluster - useless, too many clusters, no generalisation.
Trade-off: we want to, both, minimise the total within-cluster distance and to minimise the number of clusters.
Most partitional algorithms discover clusters of a given shape - imposing structure on the data.
Humans cannot visualise higher than 3-D solutions, where the real value lies.
What methods can you use to evaluate Clustering Solutions?
Re-run the same algorithm with random parameters.
Apply the algorithm to random data to check what structure it imposes
Apply algorithm to slightly different version of the data, removing one or a few attributes, and then comparing all the results to when all attributes are ran.