Clustering Flashcards
How does K-means algorithm work?
1 )First you need to decide a K and a way to assign the initial centers, it could be manually, to the fartest points or at random.
2 )Then, the points are assigned to the closest center’s cluster.
3 )Then the centers move to the center of the points related to them in the previous step.
4) Repeat steps 2 and 3 until convergence.
As the centroids are assigned randomly the final clusters are not going to be the same every time even with the same data and the same K
How can we deal with the problem that different initial centers give different clusters?
Usually we can deal in two ways:
1) Choosing a better seed using an algorithm to define the initial centers ( like K-means++)
2) Running the clustering many times (with different centers), and choosing the final clusters of the centers that had the least innertia: that moved the less during all the process.
How important is it to scale the features?
Very important! This is an algorithm that is based on distances, and having measures in different scales could give unreliable results.