2020 A3 Flashcards
k-means only finds clusters corresponding to a local (but not global) optimum of its objective
function. What does this mean with regard to the quality of the clusters found by k-means?
Give a strategy for reducing the impact of this effect on the quality of the final model.
When using gradient decent one might find local minima, get stuck, and converge, resulting in non-optimised final clusters. One can use binary split initialization to reduce this phenomenon.
k-means has difficulties effectively clustering data containing outliers. Explain why this is the
case, and suggest what could be done to alleviate the problem.
Outliers will distort the mean of the cluster. For this reason, outliers should be removed from the sample set during preprocessing.
How many parameters are needed to specify a Gaussian mixture model for d-dimensional data
using K components:
◦in the general case (i.e. full covariance matrices)?
◦when the covariance matrices are diagonal?
General case - 1/2kd(d+3)
Diagonal - 2kd
Explain why it is often preferable for code working with likelihoods to represent quantities in
the log-domain.
When multiplying long sequences of probabilities with values between 0 and 1, the product becomes smaller and smaller over time, resulting in numerical underflow.
Therefore, a log-scale is used and the log-sum trick solves the problem of underflow through summation. The log ensures that the values don’t necessarily lie between 0 and 1 due to the natural log, so they won’t get smaller over time when being multiplied together.