Module 6 Flashcards
1
Q
K-means
A
- select K
- generate K random cluster centroids
- assign each training example to the nearest centroid
- update position of each centroid
- stop if position of centroids didn’t change otherwise go to step 2
2
Q
Elbow method
A
- run K-means multiple times with different K’s
- keep track of cost L(x) for each K value
- select K where the rate of decrease sharply shifts
3
Q
Cross-validation to select K
A
- split dataset in N folds
- N-1 folds off compute centroids positions with k-means
- compute average score on validation datasets
4
Q
K-means pros
A
- easy to understand/implement
- used very often for clustering
- efficient
5
Q
K-means cons
A
- have to define K
- local optimum is sensitive to the initial centroid positions
- not suitable to discover clusters that aren’t hyper-ellipsoids
6
Q
GMM-EM
A
- select K, initialise all parameters
- compute the responsibilities
- update the mean
- update the covariance
- update the mixing proportions
- stop if converged, otherwise go to step 2