Clustering Flashcards
What is the difference beetween unsupervided learning and supervised learning?
10 / 2
What is clustering? What are the most common techniques?
10 / 2-3
Formulate the clustering problem (input & output, distance and similarity)
10 / 10
Define the k-means clustering problem, in addition specify the different objective cost functions
10 / 14 and 16
Write down the formula for a center given a cluster C_i
10 / 17
Describe verbally a naive way to solve the k-means clustering, what is the complexity of this approach?
10 / 18-19
How the Lloyd’s algorithm works? Which are the most commonly used convergence criteria? What’s the complexity? What’s the upper bound on the number of iterations?
10 / 20-24
Describe the k-means++ algorithm. What problem is trying to solve?
10 / 26
How the linkage-based clustering works? What are the 2 parameters that it needs?
10 / 29-30
What is a dendrogram? What it represents?
10 / 31
What is the common approach to choose the number k of clusters? What is the most common score to evaluate clusters?
10 / 36
Describe all the formulas in the silhouette of a clustering C and its meaning
10 / 37-38