Cluster analysis Flashcards
Cluster analysis is a range of methods that determine if there are ___________________ of data
different groups or clusters
Cluster analysis assumes _____________________
distinct grouping
Cluster analysis is _____________. we are trying to see if there are any hidden groups or clusters, but don’t know how to ________ groups. The data does not come with a ________ label
‘unsupervised’; define; class
In multivariate data sets, we can use ___________ to define similarity
distance
If two things are similar, they are probably the same sort of thing
If the other two things look very different, they are probably a different sort of thing
For nearest neighbour clustering, define the distance between clusters as the ______________ between any of the objects
shortest distance
centroid clustering use ____ between objects to define clusters. This time, consider ‘object of interest the ________ rather than the individual points that make up the cluster
distance, cluster
centroid clustering: replace _____________ with __________ of the cluster to which they belong
individual; ‘centroid’
Centroid clustering: use ______________ to define clusters, replace the ___________ with a ________________
The new object is _________________ (mean, median…)
This average is only meaningful is use __________
use distance^2 to define clusters
replace individual objects with a new ‘combined object’
the new object is between the original objects’ positions
this average is only meaningful when using distance^2
Generally, would expect nearest neighbour to have ______ distances than centroid clustering, especially as the ____________ increases
shorter, size
what does single linkage cluster analysis measure
what is the distance from an unclassified object to another object
e.g., nearest neighbour, centroid
nearest neighbour sensitive to__________
outliers
the inclusion of an object which is far way greatly increases the ‘capture power’
centroid methods do not account for _________________
the spread of within-cluster objects
what does average linkage methods measure
what is the average distance from an unclassified object to the other objects in a cluster
within group linkage method
create clusters with the smallest average linkage distance in them
between group linkage
create clusters with the smallest average distance of newly formed links