applied statistics terms Flashcards
what is clustering about?
finding discrete groups with small differences between group members
is clustering classification?
no
hard clustering
each data point is only assigned to a single cluster
soft clustering
a datapoint is assigned with a certain degree of strenght over all clusters (not used in this course)
can you add data to K-means clustering?
yes, by adding it to a cluster that already exists
what method pre-specifies number of clusters?
K-means clustering
what method creates a dendogram?
hierarchical clustering
how does k-means clustering work?
you specify how many clusters you want (k) and it creaters that many random centers. it adds the closest data point to those clusters.
what is agglomerative clustering?
from the bottom to the top, focusses on mergers
what’s divisive clustering?
from top to bottom, focusses on splits
how to calculate binary distance after clustering for binary data?
Jaccard distance, (intersection / union) or manhattan
how to calculate binary distance after clustering for continuous data?
Euclidian or manhattan
which distance techniques look at absolute distances?
Euclidian and Manhattan
which distance techniques look at relative distances?
Jaccard and Bray Curtis
what is linkage?
calculating the distance between (sub)clusters in hierarchical clustering