Clustering Flashcards
What is clustering?
process of grouping a set of data objects into multiple groups or clusters. Objects within a cluster have high similarity but very disimilar to objects in other cluster.
What is clustering or cluster analysis?
process of partitioning a set of data objects (or observations) into subsets where each subste is a cluster
What is the other name for clustering?
data segmentation
Is clustering supervised or unsupervised learning? Why?
unsupervised, because class label information is not present.
Is clustering learning by observation or by examples?
Learning by observation
What are the 8 requirements for clustering in data mining?
Scalability, Ability to deal with multitype attributes, discover clusters with a different shape, domain knowledge to determine input parameters, deal with noisy data, and incremental clustering and insensitivity to input order.
What are the two main distances that clustering determines? What type of shapes do they usually identify?
Euclidean or Manhattan distance measures. spherical clusters with similar size and density.
What general input do clustering algorithms need from user?
desired number of clusters.
Are clusters sensitivie or insensitive to noisy data generally?
Sensitive
What happens to clusters with incremental updates? What other effect does it have?
They usually have to recompute clusters from scratch. If data order is changed clusters may be completly different.
What are the name of algorithms wich can take incremental updates?
Incremental clustering algorithms
Clustering methods can be compared with what orthogonal aspects?
Paritioning criteria, separation of clusters, similarity measure and clustering space
What two types of partitioning exist?
Hierarchy and non-Hierarchy partitions
What to types of separation of clusters exist?
Mutually exclusive (only belong to one group) and non-exclusive (can belong to two or more).
How can the distance be defined in terms of measuring similarity of two objects?
Euclidean space, vector space, or any other space
What to types of measures exist for similarity
distance based methods and density- and continuity - based methods
Do basic partitioning methods adobe exclusive or non exclusive cluster separation?
exclusive
are partitioning methods usually distance or density based?
distance based
When do heurestic clustering methods work well? What type of shpaes?
spherical shape clusters in small to medium size databases
What are two popular heurestic methods?
k-means and k-medioids
What are the two classifications of hierarchical methods?
agglomerative and divisive
What does the aggloerative approach in hierarchical methods consist off?
Bootom-up approach, succesively merges objects that are close together until all groups are merged into one
What does the divisive approach in hierarchical methods consist off?
top-down approach, with each iteration they split whole data into smaller clusters
Hierrarchical methods can be based on what two methods
distance or density and continuity