Kursusgang 5 (Clustering) Flashcards

1
Q

What is clustering?

A

Clustering is the most important unsupervised learning problem, aimed at finding a structure in a collection of unlabeled data. It is the process of organizing data into groups/clusters whose members are similar in some way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is k-means clustering?

A

K-means clustering is a vector quantization technique used to partition n observations into k clusters by identifying k reference vectors (prototypes, codebook vectors, or codewords) that best represent the data. Each observation is assigned to the cluster with the nearest mean, which acts as the cluster’s prototype or centroid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mathematically define k-means clustering

A

The objective function the sum of squared distances between each data point, x_i, and the centroid, m_j, of its assigned cluster, C_j:
J(C_j, m_j)=\sum_{j=1}^k \sum_{x_i∈C_j} ∥x_i−m_j∥^2.

The k-means algorithm alternates between two steps until convergence:
Initialize m_i, i=1,…,k for example, to k random x_t.
Repeat:

(Assignment Step):
For all x_t∈X:
b_t ← {1 if ∥x_t − m_i∥ = min_⁡j ∥x_t − m_j∥,
0 otherwise.

(Update Step):
For all m_i, i=1,…,k
m_i = \sum_t b_i^t * x_t / \sum_t b_i^t.

Until m_i​ converges.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the reconstruction error for k-means clustering?

A

For x a datapoint and m_i a reference vector,

\sum_t \sum_i b_i^t * ∣∣ x^t -m_i ∣∣,
b_i^t = 1 if ∣∣ x^t -m_i ∣∣ = min_j ∣∣ x^t -m_j ∣∣
otherwise 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does vector quantization work?

A

Vector quantization chooses a set of points to represent a larger set of points. By dividing a large set of points (vectors) into groups having approximately the same number of points closest to them, each group is represented by its centroid point (prototype).
Vector quantization can be used for lossy data compression - to code a vector, find the prototype, M_k, that is closest to it, and transmit k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the applications of k-means clustering?

A

Vector quantization
Discovering hidden categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is model based clustering?

A

Each cluster is mathematically represented by a component distribution, a parametric distribution, like a Gaussian. The entire data set is therefore modeled by a mixture of these distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a mixture density?

A

By finding the component density parameters that maximize the likelihood according to some data, one can formulate the mixture densities (e.g. GMM):

p(x) = \sum_{i=1}^k p(x | G_i) P(G_i)

where G_i are the components/groups/clusters,
P(G_i) the mixture proportions (priors), and
p (x | G_i) the component densities.

This is a method of model based clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between supervised and unsupervised mixture densities?

A

Supervised / Unsupervised
Classes / Clusters or groups
Predict y given x / Discover structure or clusters in x
Uses labels y in optimization / No labels, relies on unsupervised methods
Regression / Clustering, density estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you estimate parameters in unsupervised mixture densities?

A

Find component density parameters that maximize the likelihood. If there is no analytical solution, iterative optimization is used instead. The expectation-maximization (EM) algorithm is used for estimating the maximum likelihood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the expectation-maximization algorithm work?

A

Expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters . After random initialization, the expectation-maximization algorithm alternates between two steps:

  • E-step: Compute the posterior probability that each Gaussian generates each data point –> which Gaussian generated which point.
  • M-step: Assuming that the data really was generated this way, change the parameters of each Gaussian to maximize the probability that it would generate the data it is currently responsible for.

When there are no more changes, the algorithm has converged. The simple k-means clustering is a special case of the EM algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the expectation-maximization algorithm matematically

A

Expectation Step (E-Step): Compute the expected value of the log-likelihood of the complete data (X, Z) under the current parameter estimate θ(t):
* Q(θ, θ^{(t)}) = E_{Z | X, θ^{(t)}} [log ⁡p(X, Z | θ)].
Here, the expectation is taken over the conditional distribution p(Z | X, θ(t)).

Maximization Step (M-Step): Maximize the Q-function with respect to θ:
* θ^{(t+1)}=arg⁡max_{⁡θ} Q(θ, θ^{(t)}).

These two steps are repeated until convergence, typically when θ changes very little between iterations or when the likelihood p(X∣θ) stabilizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the expectation-maximization algorithm compare to dimensionality reduction?

A

Dimensionality reduction methods find correlations between features and group features

Clustering methods (expectation-maximization algorithm) find similarities between instances and group instances. This allows for knowledge extraction through
* number of clusters,
* prior probabilities,
* cluster parameters, i.e., center, range of features.
* preprocessing for classification and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the number of clusters, k, be determined?

A
  • It can be defined by the application, like in image quantization.
  • Plot data (after PCA) and check for clusters.
  • Incremental algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances)
  • Manual check for meaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly