Clustering Flashcards
Hard clustering
Every object is a member of one one cluster.
Soft clustering
Objects can be members in more than one cluster.
Hierarchical Clustering
Pairs of most-similar clusters are iteratively linked until all objects are in a clustering relationship.
Non-hierachical clustering
Results in flat clusters of “similar” documents
Proximity Measure
Gives a measure of proximity between two documents. As measures are symmetric, the matrix is a triangle.
Raw Overlap
|X intersection Y|
Dice’s coefficient
2|X intersection Y|/(|X| + |Y|)
Jaccard’s Coefficient
|X intersection Y|/|X union Y|
Overlap coefficient
|X intersection Y|/minimum(|X|,|Y|)
Cosine overlap
|X intersection Y|/(sqrt(|X|) * sqrt(|Y|))
Single Link Function
Similarity of two most similar members.
Complete Link Function
Similarity of two least similar members.
Group Average Function
Avg. similarity of each pair of group members.
Single Link vs. Complete Link
O(N^2) vs. O(N^3)
Centroid
Average vector of cluster