K means clustering Flashcards

Question 1

Q

what is the goal of k means clustering

Answer

A

to find subgroups of participant with maximal within group similarity and minimal between group similarity

Question 2

Q

what factors are is euclidean distance and k means clustering clustering sensitive to

Answer

A

outliers, scale and different random starting centroids

Question 3

Q

what are the two quality criterion in kmeansruns() and what are their goals

Answer

A

ch finds larger more evenly sized clusters. asw forms smaller less evenly sized clusters

Question 4

Q

What is euclidean distance and how is it found

Answer

A

the ordinary straight line distance between two points in a multidimentional space. calculated using pythagorean theorem

Question 5

Q

what metric is used to weigh PCA components in the kmeansCBI() function

Answer

A

square root of the eigenvalue

Question 6

Q

what is the rough process of K means clustering

Answer

A

randomly choose K cases as centroids, assign cases to centroids so sum of euclidean distance is minimised, determine new centroids from clusters, repeat until there is no change or read a predefined maximum

Question 7

Q

why is using PCA components good for k means clustering

Answer

A

uncorrelated and standardised

Question 8

Q

what are two functions used to decide K in K means clustering

Answer

A

nbclust() and kmeansruns()

Question 9

Q

what function is used to find which clusters stand up to resampling

Answer

A

kmeansCBI()

Question 10

Q

to what part of the clusterboot() function is the output of kmeansCBI added to

Answer

A

scaling =

Question 11

Q

what does a jaccard similarity show

Answer

A

how well each cluster from the best cluster solution replicated over the number of bootstrap samples

Question 12

Q

what is a reasonably robust jaccard similarity

Question 13

Q

what are two factors to be balanced when deciding K, and what metric is used to consider them

Answer

A

robustness of clusters (jaccard similarity) and variability explained (between_ss/total_ss)