Clustering Flashcards
Clustering
An unsupervised algorithm for organising unlabeled data points based on similarity and distance metrics.
4 Type of Image Segmentation
Image segmentation - Partitioning an image into multiple segments
Semantic segmentation - All pixels that are part of the same object type get assigned to the same segment
Instance segmentation - All pixels that are part of the same individual object are assigned to the same segment
Colour segmentation - Simply assign pixels to the same segment if they have a similar colour.
3 Types of Clusters
Centre-based clusters (prototype-based) - K-Means
Density-Based clusters - DBSCAN
Hierarchical-based clusters
2 Types of Clustering
Partitional clustering - Non-overlapping subsets; Unnested
Hierarchical clustering - Organised as a hierarchical tree; Nested
3 Clustering Algorithms
K-means Clustering
Density-based Clustering
Hierarchical Clustering
K-means Clustering
A prototype-based, partitional clustering method that seeks to identify a user-specified number of clusters (K) represented by their centroids.
4 Step of K-means Clustering
Initialization: Choose the number of clusters K and randomly initialise K cluster centroids.
Assign each data point to the nearest centroid based on the Euclidean distance between the point and centroid.
Update centroids: Compute the mean of all data points assigned to each cluster and move the centroid to the mean. - Updates the location of each cluster’s centroid.
Repeat steps 2 and 3
2 Types of Clustering in K-Means
Hard Clustering - Assign each instance to a single cluster
Soft Clustering - Give each instance a score per cluster (score can be the distance between the instance and the centroid)
3 Approach to Mitigate Risk of Converging to Local Optimum
Provide the initial centroids manually
Run the algorithm many times with various random initializations and retain the best result.
Use K-means++
3 Limitations of K-means
Sizes - Cannot different size
Densities - Need high density
Non-globular shapes - Only able globular shape
3 Term of DBSCAN
Core point - At least a specified number of points (MinPts) within Eps
Border point - Not a core point, but neighbourhood of a core point
Noise point - Neither core point or border point
5 Step of DBSCAN
Label all points as core, border, or noise points.
Eliminate noise points.
Put an edge between all core points within a distance Eps of each other.
Make each group of connected core points into a separate cluster.
Assign each border point to one of the clusters of its associated core points
2 Type of Hierarchical Clustering
Agglomerative - Many to one
Divisive - One to many
4 Way to Define Inter-Cluster Distance
MIN
MAX
Group Average
Distance between centroid
2 Type of Unsupervised Measures
Cluster Cohesion (compactness): Measures how closely related are objects in a cluster. E.g. SSE
Cluster Separation: Measure how distinct or well-separated a cluster is from other clusters. E.g. Square Error