Module 6 Flashcards

1
Q

K-means

A
  • select K
  • generate K random cluster centroids
  • assign each training example to the nearest centroid
  • update position of each centroid
  • stop if position of centroids didn’t change otherwise go to step 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Elbow method

A
  • run K-means multiple times with different K’s
  • keep track of cost L(x) for each K value
  • select K where the rate of decrease sharply shifts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Cross-validation to select K

A
  • split dataset in N folds
  • N-1 folds off compute centroids positions with k-means
  • compute average score on validation datasets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

K-means pros

A
  • easy to understand/implement
  • used very often for clustering
  • efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-means cons

A
  • have to define K
  • local optimum is sensitive to the initial centroid positions
  • not suitable to discover clusters that aren’t hyper-ellipsoids
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

GMM-EM

A
  • select K, initialise all parameters
  • compute the responsibilities
  • update the mean
  • update the covariance
  • update the mixing proportions
  • stop if converged, otherwise go to step 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly