Chapter 9: Cluster Analysis Flashcards
what is cluster analysis
unsupervised
task of grouping a set of objects so objects in the same group are similar
give the 3 similarity measures
Euclidean distance
cosine similarity
minkowski distance
what are the key tasks of cluster analysis
define distance measure
identify cluster number
perform grouping
evaluate
what are the hyperparameters in cluster analysis
distance measure
cluster number
what is k means clustering
a clustering algorithm that calculates the distances to centre points
assigns to nearest
updates centre using average of cluster points
within each cluster, what is minimised
sum of squares
what is the running time of k means clustering
O( T K N)
what are the drawbacks of k means clustering (3)
doesn’t cope well with noise or outliers
need to decide number of clusters
not suitable for complex patterns
what does the distance between clusters tell us
the similarity between two points
what is single link measure
distance between clusters = minimum distance
what is multi link measure
distance between clusters = maximum distance
what is average link measure
distance between clusters = average distance
describe hierarchical clustering
objects grouped in a tree structure
what is agglomerative clustering
start with atomic clusters and merge until you get one big cluster
what is divisive clustering
start as one big cluster and separate out to atomic clusters