K means clustering Flashcards
what is the goal of k means clustering
to find subgroups of participant with maximal within group similarity and minimal between group similarity
what factors are is euclidean distance and k means clustering clustering sensitive to
outliers, scale and different random starting centroids
what are the two quality criterion in kmeansruns() and what are their goals
ch finds larger more evenly sized clusters. asw forms smaller less evenly sized clusters
What is euclidean distance and how is it found
the ordinary straight line distance between two points in a multidimentional space. calculated using pythagorean theorem
what metric is used to weigh PCA components in the kmeansCBI() function
square root of the eigenvalue
what is the rough process of K means clustering
randomly choose K cases as centroids, assign cases to centroids so sum of euclidean distance is minimised, determine new centroids from clusters, repeat until there is no change or read a predefined maximum
why is using PCA components good for k means clustering
uncorrelated and standardised
what are two functions used to decide K in K means clustering
nbclust() and kmeansruns()
what function is used to find which clusters stand up to resampling
kmeansCBI()
to what part of the clusterboot() function is the output of kmeansCBI added to
scaling =
what does a jaccard similarity show
how well each cluster from the best cluster solution replicated over the number of bootstrap samples
what is a reasonably robust jaccard similarity
0.6
what are two factors to be balanced when deciding K, and what metric is used to consider them
robustness of clusters (jaccard similarity) and variability explained (between_ss/total_ss)