Clustering Flashcards
What is unsupervised learning?
Learning patterns in data without labeled outputs or a “teacher”.
What is the goal of clustering?
To partition data into groups or clusters based on similarity.
What does the K-means algorithm minimize?
The within-cluster point scatter/variance.
What are the two main steps of the K-means algorithm?
1) Assign points to nearest cluster center, 2) Update cluster centers. Until convergence
What is the K-means++ algorithm used for?
To initialize the cluster centers for K-means in a way that improves convergence (by spreading out the initial cluster centers)`
How can the number of clusters K be determined in non-probabilistic models?
By computing the Mean Square Error (MSE) for different values of K and using Elbow Method or picking K value where SSE has a change of slope.
What is a mixture model?
A probabilistic model that represents the presence of subpopulations within an overall population.
What are the two steps in generating samples from a Gaussian mixture model?
1) Draw a categorical variable Z to select a component, 2) Draw an observation from the selected Gaussian component.
What is the EM algorithm used for in mixture models?
To estimate the parameters of the mixture model by maximizing the likelihood.
What are the two main steps of the EM algorithm?
The Expectation (E) step and the Maximization (M) step (Repeat until convergence as in stable assignments and parameters)
In the context of Gaussian Mixture Models, what does the E-step compute?
the E-step computes the expected value of the latent variables, specifically the posterior probabilities (responsibilities) that each data point belongs to each Gaussian component.
In the context of Gaussian Mixture Models, what does the M-step compute?
the M-step updates the parameters of the model (means, covariances, and mixing coefficients) to maximize the expected log-likelihood found in the E-step. This involves re-estimating the parameters to better fit the data based on the current responsibilities.
How does K-means compare to Gaussian Mixture Models?
K-means is usually faster due to fewer iterations and less computation. K-Means assumes spherical clusters with equal variance while GMM can have clusters with different shapes and sizes. GMM also have soft-assignment (probability of belonging to each cluster) while K-Means have hard assignments.
What is an advantage of mixture models over K-means?
Mixture models allow for distributional assumptions and can assess the fit of the data by computing likelihood.
What is an issue with mixture models in terms of identifiability?
The likelihood is invariant to permutation of class memberships, making the estimators valid only up to permutation.
What is hierarchical agglomerative clustering?
A clustering method that starts with singleton clusters and merges the most similar clusters iteratively.
What is the purpose of the Bayes classifier in mixture models?
To obtain class posterior probabilities when parameters are known: P(y|x, Θ) ∝ fΘ(x|y)πy
What is the main challenge in maximizing the likelihood for mixture models
The lack of a closed-form solution, requiring an iterative procedure like EM.
What is the relationship between K-means and Gaussian Mixture Models?
K-means is equivalent to a special case of GMM where all clusters have the same diagonal covariance matrix as σ approaches 0.
What is a disadvantage of mixture models compared to K-means?
Mixture models require explicit distributional assumptions.
How does the EM algorithm’s performance depend on initialization?
EM’s performance can vary significantly based on parameter initialization due to multiple local maxima.
How can missing values be handled in mixture models?
Mixture models can naturally infer missing values as part of the model fitting process.