Module 6 Flashcards by Aidan Smith

Q

K-means

A

select K
generate K random cluster centroids
assign each training example to the nearest centroid
update position of each centroid
stop if position of centroids didn’t change otherwise go to step 2

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Q

Elbow method

A

run K-means multiple times with different K’s
keep track of cost L(x) for each K value
select K where the rate of decrease sharply shifts

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Q

Cross-validation to select K

A

split dataset in N folds
N-1 folds off compute centroids positions with k-means
compute average score on validation datasets

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Q

K-means pros

A

easy to understand/implement
used very often for clustering
efficient

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Q

K-means cons

A

have to define K
local optimum is sensitive to the initial centroid positions
not suitable to discover clusters that aren’t hyper-ellipsoids

How well did you know this?

1

Not at all

2

3

4

5

Perfectly

Q

GMM-EM

A

select K, initialise all parameters
compute the responsibilities
update the mean
update the covariance
update the mixing proportions
stop if converged, otherwise go to step 2

How well did you know this?

1

Not at all

2

3

4

5

Perfectly