Session 6.3 Flashcards

1
Q

K-means clustering

A
  • Popular method
  • Centroid-based
  • Find prototype data point for each cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

K-means clustering

Prototype-based clustering

A

• Each cluster represented by a prototype
• Other names: centroid clustering, centre-based clustering
• Example: customer segmentation. Each segment has a prototype customer and
customers similar to him/her are associated with that cluster.

Centroid - real/imaginary data point with mean characteristics of all the data points
within the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

K-means clustering - how to?

A
  1. Select proximity measure and specify the number of clusters (k).
  2. Initiate the process by selecting centroids.
  3. Assign the data points to the “nearest” centroid to form a cluster.
  4. Calculate the new centroid.
  5. Iterate over steps 3 and 4 until the stopping criteria are fulfilled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use k-means?

A

Strengths:

  • Simple
  • Efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-means weaknesses

A
  • the value of k - how to determine it?
  • converges to locally optimal solution
  • globular/spherical clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hierarchical clustering

A
  • Creates a collection of ways to group the points.

* Output: a hierarchy of potential clusterings - dendrogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why use hierarchical clustering?

A

Strengths:
• Clusters can be of any size and shape.
• Does not require to prespecify the number of clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hierarchical clustering weaknesses

A
  • Still need to decide where to split.

* Computationally inefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly