Clustering & EM Flashcards
1
Q
Clustering
A
- N d-dimensional dataponts (no labels)
- goal: partition into K disjoint sets based on similarity
*
2
Q
K-Means
A
- define clusters by minimum Euclidean distance to cluster mean
- Algorithm:
- K random points as initial cluster centers
- Assignment E: assign points to closest cluster center
- Update M: update cluster center (mean of all assigned points)
- do E M until convergence to local minimum
3
Q
Gaussian Mixture Model
A
- all datapoints are generated from a mixture of a finite number of gaussian distributions with unknown parameters
- fitting GMM using EM: hard or soft cluster assignments
4
Q
GMM EM soft
A
5
Q
GMM EM hard
A
6
Q
K-Means vs GMM
A
- GMM allows for:
- unequal cluster variances
- unequal cluster probabilities
- non-spherical clusters
- soft cluster assignments
7
Q
Kullback-Leiber divergence
A
8
Q
EM-summary
A
9
Q
EM - Properties
A