Lecture 16 - Clustering Flashcards
What is clustering?
Given a set of unlabelled training examples, finding a way to partition the examples into classes/groups
Therefore be able to determine the class of any new sample
Clustering is also known as ____________ learning
unsupervised
What criteria make a good partition for clustering?
Maximise similarity within classes
Minimise similarity between classes
Minimise number of classes created
Maximises ability to predict unknown attribute values from class membership
Agglomerative Hierarchy and K-Means method are both methods of?
Clustering
What is the basic procedure of agglomerative hierarchical clustering?
Assign each sample to its own cluster
While there are at least X clusters
Find the most similar pair of clusters
Merge them into a new larger cluster
What are the prerequisites for agglomerative hierarchical clustering?
Similarity metric for samples
All examples must be available at the start
human analyst to determine optimal number of clusters
What is the basic procedure of k-means method?
k = number of clusters to form
Choose k items randomly to be cluster centers
repeat until no item changes clusters {
assign each item to its nearest cluster
set each cluster center to be the mean value of each item in the cluster
}