Taken Out Flashcards
Session 2
What is the K-Means Algorithm?
0) Input: “D” is a dataset containing n objects
1) Randomly choose k objects from D as the initial cluster centroids
2) For each of the objects in “D”:
a. Compute the distance between current objects and k cluster centroids
b. Assign the current object to that cluster to which it is the closest
3) Compute the “cluster centres” of each cluster. These become the new cluster centroids
4) Repeat 2-3 until convergence criterion is satisfied. If satisfied then stop!
5) Output: A set of “k” clusters
Session 2
Explain K-Means
• Objects are defined in terms of sets of attributes. A = {A1, A2, …, Am}, where each Ai is a continuous data type
• Distance Computation: Any distance such as L1, L2,… or cosine similarity (Cosine Similarity: A measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them)
• Minimum Distance: The measure of closeness between an object and a centroid
• Mean Calculation: Mean value of each attribute values of all objects
• Convergence Criteria: Any of the following are considered a termination condition for the algorithm
o # of max iteration permitted
o No change in centroid values in any cluster
o Zero (or no significant) movement(s) of objects from one cluster to another