Week 6 Flashcards by Jedidja Marsman

What is the aim of cluster analysis?

To create a grouping of objects such that objects within a group are similar and objects in different groups are not similar.

How well did you know this?

Not at all

Perfectly

How is a cluster defined in K-means clustering?

It is a representative point, the mean of the objects that are assigned to the cluster.

How well did you know this?

Not at all

Perfectly

What is mu_k in K-means clustering?

The mean pont for the k-th cluster.

How well did you know this?

Not at all

Perfectly

What is z_nk in K-means clustering?

A binary indictaor variable that is 1 if object n is assigned to cluster k and 0 otherwise.

How well did you know this?

Not at all

Perfectly

What requirement leads to SIGMA_k z_nk = 1 in K-means clustering?

Each object has to be assigned to one and only one cluster.

How well did you know this?

Not at all

Perfectly

What is the equation for mu_k in K-means clustering?

mu_k = (SIGMA_k of z_nk x_n) / (SIGMA _n of z_nk)

How well did you know this?

Not at all

Perfectly

What are the four steps performed in K-means clustering?
You start with initial values for the cluster means, mu₁, …, mu_k.

For each data object x_n, find the closes cluster mean and set z_nk =1 and z_nj = 0 for all j != k.
Stop if all z_nk are unchanged compared to the previous iteration.
Update each mu_k.
Return to step 1.

How well did you know this?

Not at all

Perfectly

What is the equation for D, the total distance between objects and their cluster centers, in K-means clustering?

D = SIGMA_{n=1 to n} SIGMA_{k=1 to K} z_nk (x_n - mu_k)^T (x_n - mu_k)

How well did you know this?

Not at all

Perfectly

What solution do we use to prevent K-means clustering from only reaching a local minimum?

We run the algorithm from several random starting points and use the solution that gives the lowest value of D, the total distance.

How well did you know this?

Not at all

Perfectly

What is the problem with using D, the total distance, as a model selection criteria in K-means clustering?

It decreases as K increases, so more clusters lead to a better result. This shows that, just like maximum likelihood, it kinda measures complexity.

How well did you know this?

Not at all

Perfectly

What is clustered in feature selection?

Features are clustered based on their values across objects rather than clustering the objects.

How well did you know this?

Not at all

Perfectly

Parametric density estimation

When the distribution of x given a class is a single model. So when p(x|C_i) has one model.

How well did you know this?

Not at all

Perfectly

Semiparametric density estimation

When you assume a mixture of different models, so when p(x|C_i) is a mixture of densities.

How well did you know this?

Not at all

Perfectly

Explain the elbow plot:

It’s the plot with the amount of clusters (K) on the x-axis and the reduction in variation within clusters on the y-axis. There is a decrease in reduction at some point, called the elbow.

How well did you know this?

Not at all

Perfectly

How do you calculate Euclidian distance between two points?

root(x² + y²)

How well did you know this?

Not at all

Perfectly

Describe the formula for total distance/ reconstruction error D in K-means clustering, using words:

Study These Flashcards

The sum of all datapoints, for each cluster k, the z of whether that datapoint is in cluster k (so 0 or 1) times the Euclidean distance between the datapoint and the mean of that cluster k.

Entropy of a random variable

Study These Flashcards

The average level of uncertainty or info associated with the variable’s possible outcomes.

What is the formula for the entropy H(X) of a discrete random variable X which takes values in the set S and is distributed according to p: S -> [0,1] ?

Study These Flashcards

H(X) = - SIGMA_{x in S} p(x) logp(x)

MI (abbreviation)

Study These Flashcards

Mutual information

Mutual information (in probability theory) of two random variables

Study These Flashcards

A measure of the mutual dependence between the two variables, so it quantifies the amount of info obtained about one random variable by observing the other random variable.

How does statistical mixture represent each cluster?

Study These Flashcards

As a probability density.

How does the EM algorithm solve the fact that there’s a summation inside the logarithm likelihood?

Study These Flashcards

It derives a lower bound on the likelihood, which can be maximized instead of the log likelihood.

What step do you first need to take when you want to use Jensen’s inequality to lowe bound the likelihood in the EM algorithm?

Study These Flashcards

Make the right-hand side of the log likelihood look like the log of an expectation.

How do you obtain updates of each of the parameters in the bound B for each iteration in the EM algorithm?

Study These Flashcards

Take the partial derivative of the bound B with respect to the relevant parameter, set to zero and solve.

When do you specify the number of components (clusters) in the EM algorithm?

A priori.

How do you go about the EM algorithm after setting the amount of components?

Initialise some of the parameters: randomly choose means and covariances of the mixture components and initialise pi_k by assuming a prior distribution over the components.

What do you compute in the E-step after initiliasing the mean, covariance and pi_k in the EM algorithm?

q_nk

What do you do in the EM algorithm after you've initialised the means, covariances, pi_k and computed q_nk?

Subsequently update pi_k, mu_k and *SIGMA*_k.

What do the values of q_nk represent in the EM algorithm?

The posterior probability of the objects belonging to components.

What are the four parameters that are updated in the EM algorithm?

pi_k, mu_k, *SIGMA*_k and q_nk.

What does pi_k represent in the EM algorithm?

p(z_nk = 1) The probability that a point is part of a certain component, a particular mixture model in the case of EM.

What do mu_k and *SIGMA*_k denote in the EM algorithm?

The parameters of the k-th mixture model (the k-th Gaussian), so the mean and the variance.

Give Jensen's inequality:

log E_p(z) {f(z)} >= E_p(z) {log f(z) }

What is the essence of Jensen's inequality?

The log of the expected value of f(z) is always greater than or equal to the expected value of log f(z).

In the EM algorithm, why do we divide the expression inside the summation over k in the log likelihood by a new variable q_nk?

Because we need to make that side of the equation look like the log of an expectation.

Week 6 Flashcards

(35 cards)