Week 6 Flashcards
What is the aim of cluster analysis?
To create a grouping of objects such that objects within a group are similar and objects in different groups are not similar.
How is a cluster defined in K-means clustering?
It is a representative point, the mean of the objects that are assigned to the cluster.
What is muk in K-means clustering?
The mean pont for the k-th cluster.
What is znk in K-means clustering?
A binary indictaor variable that is 1 if object n is assigned to cluster k and 0 otherwise.
What requirement leads to SIGMAk z nk = 1 in K-means clustering?
Each object has to be assigned to one and only one cluster.
What is the equation for muk in K-means clustering?
muk = (SIGMAk of znk xn) / (SIGMA n of znk)
What are the four steps performed in K-means clustering?
You start with initial values for the cluster means, mu1, …, muk.
- For each data object xn, find the closes cluster mean and set znk =1 and znj = 0 for all j != k.
- Stop if all znk are unchanged compared to the previous iteration.
- Update each muk.
- Return to step 1.
What is the equation for D, the total distance between objects and their cluster centers, in K-means clustering?
D = SIGMA n=1 to n SIGMA k=1 to K znk (xn - muk)T (xn - muk)
What solution do we use to prevent K-means clustering from only reaching a local minimum?
We run the algorithm from several random starting points and use the solution that gives the lowest value of D, the total distance.
What is the problem with using D, the total distance, as a model selection criteria in K-means clustering?
It decreases as K increases, so more clusters lead to a better result. This shows that, just like maximum likelihood, it kinda measures complexity.
What is clustered in feature selection?
Features are clustered based on their values across objects rather than clustering the objects.
Parametric density estimation
When the distribution of x given a class is a single model. So when p(x|Ci) has one model.
Semiparametric density estimation
When you assume a mixture of different models, so when p(x|Ci) is a mixture of densities.
Explain the elbow plot:
It’s the plot with the amount of clusters (K) on the x-axis and the reduction in variation within clusters on the y-axis. There is a decrease in reduction at some point, called the elbow.
How do you calculate Euclidian distance between two points?
root(x2 + y2)
Describe the formula for total distance/ reconstruction error D in K-means clustering, using words:
The sum of all datapoints, for each cluster k, the z of whether that datapoint is in cluster k (so 0 or 1) times the Euclidean distance between the datapoint and the mean of that cluster k.
Entropy of a random variable
The average level of uncertainty or info associated with the variable’s possible outcomes.
What is the formula for the entropy H(X) of a discrete random variable X which takes values in the set S and is distributed according to p: S -> [0,1] ?
H(X) = - SIGMAx in S p(x) logp(x)
MI (abbreviation)
Mutual information
Mutual information (in probability theory) of two random variables
A measure of the mutual dependence between the two variables, so it quantifies the amount of info obtained about one random variable by observing the other random variable.
How does statistical mixture represent each cluster?
As a probability density.
How does the EM algorithm solve the fact that there’s a summation inside the logarithm likelihood?
It derives a lower bound on the likelihood, which can be maximized instead of the log likelihood.
What step do you first need to take when you want to use Jensen’s inequality to lowe bound the likelihood in the EM algorithm?
Make the right-hand side of the log likelihood look like the log of an expectation.
How do you obtain updates of each of the parameters in the bound B for each iteration in the EM algorithm?
Take the partial derivative of the bound B with respect to the relevant parameter, set to zero and solve.
When do you specify the number of components (clusters) in the EM algorithm?
A priori.
How do you go about the EM algorithm after setting the amount of components?
Initialise some of the parameters: randomly choose means and covariances of the mixture components and initialise pik by assuming a prior distribution over the components.
What do you compute in the E-step after initiliasing the mean, covariance and pik in the EM algorithm?
qnk
What do you do in the EM algorithm after you’ve initialised the means, covariances, pik and computed qnk?
Subsequently update pik, muk and SIGMAk.
What do the values of qnk represent in the EM algorithm?
The posterior probability of the objects belonging to components.
What are the four parameters that are updated in the EM algorithm?
pik, muk, SIGMAk and qnk.
What does pik represent in the EM algorithm?
p(znk = 1)
The probability that a point is part of a certain component, a particular mixture model in the case of EM.
What do muk and SIGMAk denote in the EM algorithm?
The parameters of the k-th mixture model (the k-th Gaussian), so the mean and the variance.
Give Jensen’s inequality:
log Ep(z) {f(z)} >= Ep(z) {log f(z) }
What is the essence of Jensen’s inequality?
The log of the expected value of f(z) is always greater than or equal to the expected value of log f(z).
In the EM algorithm, why do we divide the expression inside the summation over k in the log likelihood by a new variable qnk?
Because we need to make that side of the equation look like the log of an expectation.