Lec6 - Unsupervised Learning Flashcards
What is the difference between Supervised and Unsupervised Learning?
In supervised learning the data is labelled while in unsupervised learning it is unlabelled.
What is clustering?
Clustering is trying to group data together in the feature space.
A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in other clusters.
Describe the k-means algorithm.
- Choose number of clusters k
- Randomly place k centroids in the feature space
- Assign each datapoint to the nearest centroid using a distance metric such as Euclidian Distance.
- Update the centroid positions by computing the mean position of all the data-points associated to the centroid.
5 Repeat 3-4 until convergence.
What is the elbow method
The elbow method is a method of selecting the best k for k-means.
We run k-means for different values of k (eg. 1 - 10) and plot a graph of the score. The score is the average of the distances to each centroid. We then select the k where the rate of decrease of the score sharply shifts (which looks like an elbow joint on the graph)
What are two methods of selecting k for k-means?
- Elbow Method
- Cross Validation
Describe the strengths and weakness of k-means.
Strengths:
- Simple to understand and implement
- Efficient
- Popular
Weaknesses:
- We have to define k
- Algorithm only applicable when mean is defined.
- Sensitive to Initialisation
- Sensitive to Outliers
- Not suitable for discovering clusters which are not hyper-ellipsoids/spheres
What is one of the main applications of Density Estimation?
One of the main applications of density estimation is anomaly detection or novelty detection.
What is a probability density function (PDF)?
A PDF models the probability of a sample to be generated in a specific area.
It is very likely or usual to observe samples where the PDF is high. Conversely, it is rare to observe samples where the PDF is low.
Give the equation of a univariate Gaussian distribution and name its parameters.
Parameters:
mean μ, variance σ^2
N(x | μ, σ) = (1 / sqrt(2πσ^2))exp(-(x - μ)^2 /2σ^2)
What is the likelihood p(x | θ) and why do we often compute the negative log likelihood instead of the likelihood directly?
The likelihood p(x | θ) is the probability of observing our data x given our parameters θ. We often compute the negative log likelihood because it provides much more numerically stable results, turning the product of terms to a sum of terms. The negation is because we often try to minimise during optimisation instead of maximising.
What is a Gaussian Mixture Model (GMM)?
It is a linear combination of different families of Gaussians.
What is Expectation Maximisation (EM)?
EM is an iterative approach to finding parameters, which can be used with GMMs. It’s composed of two steps:
1. E-step: Compute responsibilities r_nk (corresponds to the posterior probability of data point n to belong to the mixture component k)
2. M-step: Use the updated responsibilities to re-estimate the parameters θ.
Repeat
Describe the GMM-EM Algorithm
- Initialise
- E-Step:
- Compute the Responsibilities - M-Step:
- Update the weight.
- Update the mean.
- Update the covariance.
Repeat E and M steps until convergence
Give two ways of determining convergence in GMMs.
- No significant variation of the parameters
- Stagnation of the likelihood.
Describe the differences between k-means and GMMs, and mention which is the key difference between them.
K-means: - Objective function: Minimises sum of squared Euclidean GMM-EM -Can be optimised by an EM algorithm - E-step: assign points to clusters - M-step: optimise clusters - Performs hard assignment during E-step - Assumes spherical clusters with equal probability of a
GMM-EM:
- Objective function: Maximise log-likelihood
- EM algorithm
- E-step: Compute posterior probability of membership
- M-step: Optimise parameters
- Perform soft assignment during E-step
- Can be used for non-spherical clusters
- Can generate clusters with different probabilities