k means/ k-medoid Flashcards
k means/ k-mediod is supervised/ unsupervised learning
Unsupervised learning
Primary goal of k-means algo
Primary goal of k-means clustering is to minimise the distance between the points in the same cluster.
While the algo indirectly increases the separation between clusters by minimising intra cluster distances, its directive object is not to maximise inter-cluster distances.
The goal of k-means is to minimise the sum of the squared distances between each data point and its centroid.
The algorithm aims to minimize the sum of squared distances between data points and their respective cluster centroids.
How k means algo grouped data
It groups the data into k-clusters by minimising the variance within each cluster.
k means clustering is __________
k means clustering is partitioning clustering.
SSE in k-means clustering
SSE stands for sum of squared error
SSS = ∑d(x-c)²
here d:distance, x:data point, c:centroid of cluster
Explain SSE for k=1, k=2 and k=3
k value and SSE graph
Elbow curve
Point remember in k-means problem solving
If data point difference to the first centroid, data point difference to the second centroid both are matched. You can use any cluster to keep that point. Automatically, in next iteration, it will be proper classified because mean value changed.
Which is more efficient (k means/ hierarchal clustering)
k means
Time complexity of k-means algo
O(tkn)
Here
n: number of data points
k: number of clusters
t: number of iterations
Since both k and t is small. k-means is considered a linear algorithm
Stopping/ convergence criterion of k-means algo
k-means for categorical data
For categorical data, we use k-mode instead of k-means.
The centroid is represented by most frequent values
Outliers vs k-means
k-means is sensitive to outliers.
* Outliers are data points that are very far away from other data points.
* Outliers could be errors in dat arecording or special data points with very different values.
WCSS in k-means
* full form
* Definition
* Low WCSS and high WCSS means
WCSS- Within cluster sum of squares
WCSS is a metric used to evaluate the quality of the clusters formed by the k-means clustering algorithm.
It measures the sum of sqaured distances between each data point and the centroid of the cluster to which it belongs.
A lower WCSS value indicates that the data points are closer to their respective cluster centroids, which means the clusters are more compact and better defined.
WCSS formulas (2)
Also relation between both formulas with example