Week 9 Flashcards

1
Q

Common metrics for dissimilarity for clustering

A

Euclidean distance
Squared euclidean distance
Manhattan distance
Cosine distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is k-means clustering

A

Partitioning the input data into K>0 distinct clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does k-means work

A

Iterative algorithm, uses squared Euclidean distance and the centroid (average) as cluster representative.
Can only accurately model spherical clusters this way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

is k-means good?

A

typically finds sub-optimal solutions as k-clustering problem is NP-hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is hierarchical clustering

A

Generate a hierarchy of clusters with different resolutions
Computationally heavy
Agglomerative / Divisive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we validate cluster quality?

A

Compactness (low variance) and separation (large spread)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly