k means/ k-medoid Flashcards

Question 1

Q

k means/ k-mediod is supervised/ unsupervised learning

Answer

A

Unsupervised learning

Question 2

Q

Primary goal of k-means algo

Answer

A

Primary goal of k-means clustering is to minimise the distance between the points in the same cluster.
While the algo indirectly increases the separation between clusters by minimising intra cluster distances, its directive object is not to maximise inter-cluster distances.

The goal of k-means is to minimise the sum of the squared distances between each data point and its centroid.
The algorithm aims to minimize the sum of squared distances between data points and their respective cluster centroids.

Question 3

Q

How k means algo grouped data

Answer

A

It groups the data into k-clusters by minimising the variance within each cluster.

Question 4

Q

k means clustering is __________

Answer

A

k means clustering is partitioning clustering.

Question 5

Q

SSE in k-means clustering

Answer

A

SSE stands for sum of squared error
SSS = ∑d(x-c)²
here d:distance, x:data point, c:centroid of cluster

Question 6

Q

Explain SSE for k=1, k=2 and k=3

Question 7

Q

k value and SSE graph

Answer

A

Elbow curve

Question 8

Q

Point remember in k-means problem solving

Answer

A

If data point difference to the first centroid, data point difference to the second centroid both are matched. You can use any cluster to keep that point. Automatically, in next iteration, it will be proper classified because mean value changed.

Question 9

Q

Which is more efficient (k means/ hierarchal clustering)

Question 10

Q

Time complexity of k-means algo

Answer

A

O(tkn)
Here
n: number of data points
k: number of clusters
t: number of iterations
Since both k and t is small. k-means is considered a linear algorithm

Question 11

Q

Stopping/ convergence criterion of k-means algo

Question 12

Q

k-means for categorical data

Answer

A

For categorical data, we use k-mode instead of k-means.
The centroid is represented by most frequent values

Question 13

Q

Outliers vs k-means

Answer

A

k-means is sensitive to outliers.
* Outliers are data points that are very far away from other data points.
* Outliers could be errors in dat arecording or special data points with very different values.

Question 14

Q

WCSS in k-means
* full form
* Definition
* Low WCSS and high WCSS means

Answer

A

WCSS- Within cluster sum of squares
WCSS is a metric used to evaluate the quality of the clusters formed by the k-means clustering algorithm.
It measures the sum of sqaured distances between each data point and the centroid of the cluster to which it belongs.

A lower WCSS value indicates that the data points are closer to their respective cluster centroids, which means the clusters are more compact and better defined.

Question 15

Q

WCSS formulas (2)
Also relation between both formulas with example

Question 16

Q

Goal of k-means in terms of WCSS

Answer

A

Minimization of WCSS is the goal of the k-means algorithm. The algorithm iteratively tries to adjust the positions of the centroids to minimise the WCSS.

Question 17

Q

Application of WCSS

Answer

A

WCSS used to determine the optimal number of clusters using the elbow method, which helps find a balance between number of clusters and how well the data is clustered.

Question 18

Q

Methods to determine the optimal number of clusters (K) in k-means algo

Answer

A

1.) The elbow method
2.) The silhouette method

Question 19

Q

The elbow method (definition)

Answer

A

It is based on the idea that increasing the number of clusters K will reduce WCSS, but after a certain point, the improvement will diminish, forming an ‘elbow’ in the curve.

Question 20

Q

How to identify the elbow

Answer

A

The elbow is where the plot starts to bend or flatten out. This indicates the number of clusters.

Question 21

Q

The silhouette method

Answer

A

It measure how similar a data point to its own cluster (cohesion) compared to other clusters (separation).
It computes the silhouette coefficient for each point, which quantifies how well a point fits into its assigned cluster.
The average silhouette score across all points can be used to evaluate different values of k.

Question 22

Q

silhouette coëfficiënt

Question 23

Q

silhouette coëfficiënt range

Question 24

Q

Three points about silhouette coëfficiënt

Question 25

Q

Comparison of elbow and silhouette methods

Question 26

Q

k-medoids - supervised/unsupervised

Answer

A

unsupervised

Question 27

Q

How k-medoid different from k-means

Answer

A

k-medoid is an improvised version of k-means algo mainly designed to deal with outlier data sensitivity.

Question 28

Q

Compare k-medoid with other portioning algos

Answer

A

Compared to other portioning altos, the algorithm is simple, fast and easy to implement.

Question 29

Q

methods difference of k-means and k-medoid

Answer

A

k-medoid clustering method unlike k means, rather than minimising the sum of squared distances, k-medoids works on minimising the number of paired dissimilarities.

Question 30

Q

instead of centroids that we have in k-medoid
also define it

Answer

A

Instead of centroids, k-medoid approach makes use of medoids.
Medoids are points in the dataset whose sum of distances to other points in cluster is minimal.
Unlike K-Means, where clusters are represented by centroids (which may not be actual data points), K-Medoids selects actual data points as cluster centers.
A medoid is the data point in a cluster that has the smallest total dissimilarity to all other points in that cluster.

Question 31

Q