Clustering Flashcards

1
Q

How does K-means algorithm work?

A

1 )First you need to decide a K and a way to assign the initial centers, it could be manually, to the fartest points or at random.

2 )Then, the points are assigned to the closest center’s cluster.

3 )Then the centers move to the center of the points related to them in the previous step.

4) Repeat steps 2 and 3 until convergence.

As the centroids are assigned randomly the final clusters are not going to be the same every time even with the same data and the same K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we deal with the problem that different initial centers give different clusters?

A

Usually we can deal in two ways:
1) Choosing a better seed using an algorithm to define the initial centers ( like K-means++)

2) Running the clustering many times (with different centers), and choosing the final clusters of the centers that had the least innertia: that moved the less during all the process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How important is it to scale the features?

A

Very important! This is an algorithm that is based on distances, and having measures in different scales could give unreliable results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly