Lecture 11 - Clustering: EM Algorithm for GMM Flashcards

Question 1

Q

What is a Gaussian Mixture Model (GMM)?

Answer

A

A probabilistic model representing data as a mixture of Gaussian distributions.

Question 2

Q

Priors - 𝜋𝑖

Answer

A

𝜋𝑖 : Probabilities of each cluster.

Question 3

Q

Means - 𝜇𝑖

Answer

A

𝜇𝑖: Centers of Gaussian clusters.

Question 4

Q

Covariances - Σ𝑖

Answer

A

Σ𝑖: Shapes and spreads of clusters.

Question 5

Q

Why use GMM instead of K-Means?
GMM vs. K-Means

Answer

A

Soft Clustering: Assigns probabilities to clusters, not hard assignments.

Handles Arbitrary Shapes: Covariances allow ellipsoid-shaped clusters.

Probabilistic Reasoning: More robust for complex data distributions.

Mnemonic: “GMM Gives More Meaningful shapes.”

Question 6

Q

What is the Expectation-Maximization (EM) Algorithm?

Answer

A

An iterative algorithm to optimize parameters (π,μ,Σ) in GMM

Question 7

Q

What happens in the E-Step?

Answer

A

Compute the posterior probability hi,t that a point belongs to a cluster:

Question 8

Q

What happens in the M-Step?

Answer

A

Update parameters using hi,j.

Question 9

Q

How does EM converge?

Answer

A

Converges when the log-likelihood stabilizes.

Alternative: Stop when the change in log-likelihood is below a threshold.

Tip: Imagine EM climbing a mountain of likelihood to the peak (maximum).

Question 10

Q

Challenges with GMM and EM

Answer

A

Initialization Sensitivity: Poor starts may lead to local optima.

Choosing 𝑘: Methods include Elbow, BIC, and AIC.

Computational Cost: Iterative nature makes it slow for large datasets.

Question 11

Q

Applications of GMM. Where is GMM used?

Answer

A

Image Segmentation: Dividing an image into meaningful regions.

Speech Recognition: Modeling phonemes or accents.

Anomaly Detection: Identifying outliers with low probabilities.