Book - Chapter 4 clustering Flashcards
What is clustering
Is the uses unsupervised techniques for grouping similar objects
What is the centre of a K means cluster
Arithmetic average
In case it means are the clusters numerical or categorical
Numerical
What is the input of K means
Euclidean distance
What is the outcome of K means
A cluster centre.
Clustering is primarily an exploratory technique to discover what
Hidden structures of the data, possibly as a prelude to more focused analysis or decision processes
What are the use cases of K beans
Image processing, medical, and customer segmentation
How would you find out the value of K
By using within the sum of squares (WSS)
What is WSS
The sum of the squares of the distances between each Datapoint and the closest centroid
What do you do if you’re missing expected splits
Increase K
What do you do if clusters have few data points
Decrease K
What do you do if the centroids are close together
Decrease K