Week 6 Flashcards

1
Q

What is the aim of cluster analysis?

A

To create a grouping of objects such that objects within a group are similar and objects in different groups are not similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is a cluster defined in K-means clustering?

A

It is a representative point, the mean of the objects that are assigned to the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is muk in K-means clustering?

A

The mean pont for the k-th cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is znk in K-means clustering?

A

A binary indictaor variable that is 1 if object n is assigned to cluster k and 0 otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What requirement leads to SIGMAk z nk = 1 in K-means clustering?

A

Each object has to be assigned to one and only one cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the equation for muk in K-means clustering?

A

muk = (SIGMAk of znk xn) / (SIGMA n of znk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the four steps performed in K-means clustering?
You start with initial values for the cluster means, mu1, …, muk.

A
  1. For each data object xn, find the closes cluster mean and set znk =1 and znj = 0 for all j != k.
  2. Stop if all znk are unchanged compared to the previous iteration.
  3. Update each muk.
  4. Return to step 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the equation for D, the total distance between objects and their cluster centers, in K-means clustering?

A

D = SIGMA n=1 to n SIGMA k=1 to K znk (xn - muk)T (xn - muk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What solution do we use to prevent K-means clustering from only reaching a local minimum?

A

We run the algorithm from several random starting points and use the solution that gives the lowest value of D, the total distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the problem with using D, the total distance, as a model selection criteria in K-means clustering?

A

It decreases as K increases, so more clusters lead to a better result. This shows that, just like maximum likelihood, it kinda measures complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is clustered in feature selection?

A

Features are clustered based on their values across objects rather than clustering the objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parametric density estimation

A

When the distribution of x given a class is a single model. So when p(x|Ci) has one model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Semiparametric density estimation

A

When you assume a mixture of different models, so when p(x|Ci) is a mixture of densities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the elbow plot:

A

It’s the plot with the amount of clusters (K) on the x-axis and the reduction in variation within clusters on the y-axis. There is a decrease in reduction at some point, called the elbow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate Euclidian distance between two points?

A

root(x2 + y2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the formula for total distance/ reconstruction error D in K-means clustering, using words:

A

The sum of all datapoints, for each cluster k, the z of whether that datapoint is in cluster k (so 0 or 1) times the Euclidean distance between the datapoint and the mean of that cluster k.

17
Q

Entropy of a random variable

A

The average level of uncertainty or info associated with the variable’s possible outcomes.

18
Q

What is the formula for the entropy H(X) of a discrete random variable X which takes values in the set S and is distributed according to p: S -> [0,1] ?

A

H(X) = - SIGMAx in S p(x) logp(x)

19
Q

MI (abbreviation)

A

Mutual information

20
Q

Mutual information (in probability theory) of two random variables

A

A measure of the mutual dependence between the two variables, so it quantifies the amount of info obtained about one random variable by observing the other random variable.

21
Q

How does statistical mixture represent each cluster?

A

As a probability density.

22
Q

How does the EM algorithm solve the fact that there’s a summation inside the logarithm likelihood?

A

It derives a lower bound on the likelihood, which can be maximized instead of the log likelihood.

23
Q

What step do you first need to take when you want to use Jensen’s inequality to lowe bound the likelihood in the EM algorithm?

A

Make the right-hand side of the log likelihood look like the log of an expectation.

24
Q

How do you obtain updates of each of the parameters in the bound B for each iteration in the EM algorithm?

A

Take the partial derivative of the bound B with respect to the relevant parameter, set to zero and solve.

25
Q

When do you specify the number of components (clusters) in the EM algorithm?

A

A priori.

26
Q

How do you go about the EM algorithm after setting the amount of components?

A

Initialise some of the parameters: randomly choose means and covariances of the mixture components and initialise pik by assuming a prior distribution over the components.

27
Q

What do you compute in the E-step after initiliasing the mean, covariance and pik in the EM algorithm?

A

qnk

28
Q

What do you do in the EM algorithm after you’ve initialised the means, covariances, pik and computed qnk?

A

Subsequently update pik, muk and SIGMAk.

29
Q

What do the values of qnk represent in the EM algorithm?

A

The posterior probability of the objects belonging to components.

30
Q

What are the four parameters that are updated in the EM algorithm?

A

pik, muk, SIGMAk and qnk.

31
Q

What does pik represent in the EM algorithm?

A

p(znk = 1)
The probability that a point is part of a certain component, a particular mixture model in the case of EM.

32
Q

What do muk and SIGMAk denote in the EM algorithm?

A

The parameters of the k-th mixture model (the k-th Gaussian), so the mean and the variance.

33
Q

Give Jensen’s inequality:

A

log Ep(z) {f(z)} >= Ep(z) {log f(z) }

34
Q

What is the essence of Jensen’s inequality?

A

The log of the expected value of f(z) is always greater than or equal to the expected value of log f(z).

35
Q

In the EM algorithm, why do we divide the expression inside the summation over k in the log likelihood by a new variable qnk?

A

Because we need to make that side of the equation look like the log of an expectation.