K means clustering Flashcards

1
Q

what is the goal of k means clustering

A

to find subgroups of participant with maximal within group similarity and minimal between group similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what factors are is euclidean distance and k means clustering clustering sensitive to

A

outliers, scale and different random starting centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the two quality criterion in kmeansruns() and what are their goals

A

ch finds larger more evenly sized clusters. asw forms smaller less evenly sized clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is euclidean distance and how is it found

A

the ordinary straight line distance between two points in a multidimentional space. calculated using pythagorean theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what metric is used to weigh PCA components in the kmeansCBI() function

A

square root of the eigenvalue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the rough process of K means clustering

A

randomly choose K cases as centroids, assign cases to centroids so sum of euclidean distance is minimised, determine new centroids from clusters, repeat until there is no change or read a predefined maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

why is using PCA components good for k means clustering

A

uncorrelated and standardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are two functions used to decide K in K means clustering

A

nbclust() and kmeansruns()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what function is used to find which clusters stand up to resampling

A

kmeansCBI()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

to what part of the clusterboot() function is the output of kmeansCBI added to

A

scaling =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does a jaccard similarity show

A

how well each cluster from the best cluster solution replicated over the number of bootstrap samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is a reasonably robust jaccard similarity

A

0.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are two factors to be balanced when deciding K, and what metric is used to consider them

A

robustness of clusters (jaccard similarity) and variability explained (between_ss/total_ss)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly