K-Means Flashcards by Zoey Sheffield

Define monothetic cluster

Members have some common property

How well did you know this?

Not at all

Perfectly

Define polythetic cluster

Cluster members are similar to each other

How well did you know this?

Not at all

Perfectly

Define hard clustering

Clusters do not overlap

How well did you know this?

Not at all

Perfectly

Define soft clustering

Clusters may overlap

How well did you know this?

Not at all

Perfectly

Define flat clustering

set of clusters without any explicit structure that would relate clusters to each other

How well did you know this?

Not at all

Perfectly

Define hierarchical clustering

Produces a hierarchy of clusters

How well did you know this?

Not at all

Perfectly

K-D trees, monothetich or polythetic?

Monothetic, few cuts describe each member of the region

How well did you know this?

Not at all

Perfectly

K-D trees, hard or soft boundaries?

Hard

How well did you know this?

Not at all

Perfectly

K-D trees, flat or hierarchical?

Hierarchical, we can cut the tree at any depth

How well did you know this?

Not at all

Perfectly

K-means, monothetic or polythetic?

Polythetic

How well did you know this?

Not at all

Perfectly

K-means, hard or soft boundaries?

Hard

How well did you know this?

Not at all

Perfectly

K-means, flat or hierarchical?

Flat

How well did you know this?

Not at all

Perfectly

Gaussian mixtures, monothetic or polythetic?

Polythetic

How well did you know this?

Not at all

Perfectly

Gaussian mixtures, hard or soft boundaries?

Soft

How well did you know this?

Not at all

Perfectly

Gaussian mixtures, flat or hierarchical?

Flat

How well did you know this?

Not at all

Perfectly

Agglomerative clustering, monothetic or polythetic?

Study These Flashcards

Polythetic

Agglomerative clustering, hard or soft boundaries?

Study These Flashcards

Hard

Agglomerative clustering, flat or hierarchical?

Study These Flashcards

Hierarchical

What are some common use cases for K-means? (2)

Study These Flashcards

Discovering classes (unsupervised)
Dimensionality reduction

How can k-means be used for dimensionality reduction?

Study These Flashcards

Run k-means, replace features with a cluster number

K-means clustering algorithm

Study These Flashcards

Place centroids c₁, …, c_krandomly
Repeat until convergence
1. For each point x_i
  1. Find the nearest centroid c_j
  2. Assign point x_i to cluster j
2. For each cluster, make the centroid c_j closest to all the data points in the cluster (for euclidian distances this is the mean)

What does it mean that K-means converges to a local minimum?

Study These Flashcards

Different starting points can produce difference clusters

Whats the variance for K-means?

Study These Flashcards

Sum of distance to every points centroid

What is a scree plot?

Study These Flashcards

How can you pick a good value of K for k-means? (2)

* class labels may suggest a value (digits recognition - 10 digits) * optimize V visually from a scree plot (where *mountain* ends and *rubble* begins)

How can we extrinsicly excaulate a clustering algorithm?

Use it as part of another problem (e.g. removing outliers for digit recognition), and see if it helped

How can we intrinsicly evaluate a clustering algorithm (with reference clusters)?

* Align reference clusters R_j with system produced clusters C_i * Measure the accuracy

How do we measure the accuracy in intrinsic clustering evaluation (with reference clusters)?

The sum of the overlapping instances in each pair of system and reference clusters over the number of training instances

How can we align a reference cluster with a system cluster?

* Pick the pair of reference and system clusters with the *maximum overlap* * *Greedly reassign clusters* until each reference cluster has a unique system cluster

How can we intrinsicly evaluate clusters with humans?

* Sample pairs * Ask humans if they should be in the same cluster * Count errors and compute accuracy

What is the advantage of intrinsic evaluation with humans vs reference clusters?

* Does not require cluster alignment strategy * Can handle overlapping classes

How can we use K-Means as part of image recognition?

* Split image into regions * Compute statistics of regions * Use K-Means to cluster regions * Put clusters together like bag-of-words { 4 \* "C27", 15 \* "C44", ... } * Use any algorithm on the flat representation

What does K-Means minimize?

the aggregate intra-cluster distance

K-Means Flashcards

(33 cards)