Session 6.3 Flashcards

Question 1

Q

K-means clustering

Answer

A

Popular method
Centroid-based
Find prototype data point for each cluster

Question 2

Q

K-means clustering

Prototype-based clustering

Answer

A

• Each cluster represented by a prototype
• Other names: centroid clustering, centre-based clustering
• Example: customer segmentation. Each segment has a prototype customer and
customers similar to him/her are associated with that cluster.

Centroid - real/imaginary data point with mean characteristics of all the data points
within the cluster

Question 3

Q

K-means clustering - how to?

Answer

A

Select proximity measure and specify the number of clusters (k).
Initiate the process by selecting centroids.
Assign the data points to the “nearest” centroid to form a cluster.
Calculate the new centroid.
Iterate over steps 3 and 4 until the stopping criteria are fulfilled.

Question 4

Q

Why use k-means?

Answer

A

Strengths:

Simple
Efficient

Question 5

Q

K-means weaknesses

Answer

A

the value of k - how to determine it?
converges to locally optimal solution
globular/spherical clusters

Question 6

Q

Hierarchical clustering

Answer

A

Creates a collection of ways to group the points.

* Output: a hierarchy of potential clusterings - dendrogram.

Question 7

Q

Why use hierarchical clustering?

Answer

A

Strengths:
• Clusters can be of any size and shape.
• Does not require to prespecify the number of clusters.

Question 8

Q

Hierarchical clustering weaknesses

Answer

A

Still need to decide where to split.

* Computationally inefficient.

Session 6.3 Flashcards

(8 cards)