Week 6 Flashcards
K means algo
If 2 mean points are equidistant from point, choose one with lower index
K median algo
As image except 2.1.2 is not abs value squared
Eg for (3,4) and (4,2) manhattan dist is 3
K means ++
Same as K means but to initialise mean points use
When selecting centroids 2,…, k, you are again choosing a random number each time to pick a point to use as a centroid. The probability distribution can be thought of as a number line and you are picking the number immediately to the right of the randomly generated P
How to choose k for k means etc
K means score is sum of square distances of points from associated centroids
K means and K median are examples of what kind of clustering
Flat
2 types of hierarchical clustering
Agglomerative (bottom up)
Divisive (top down)
Agglomerative clustering on a graph
Assuming each edge represents the similarity of the 2 nodes it connects
Begin with each node is a cluster
Find the single edge with largest value
Let those 2 nodes be one cluster
Repeat until k = # of clusters
How to visualise the stages of clustering in Agglomerative
Bisection K means
(Divisive) begin at one cluster and split using sparsest cut
Sparsest cut
Single agglomerative clustering
Single: edge with highest similarity
Average linkage agglomerative clustering
Average: sum all edges between 2 clusters divided by # of edges between 2clusters
Complete linkage agglomerative clustering
Complete linkage: minimax , combine clusters with highest least similar edge