Clustering Flashcards

1
Q

Clustering

A

An unsupervised algorithm for organising unlabeled data points based on similarity and distance metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 Type of Image Segmentation

A

Image segmentation - Partitioning an image into multiple segments
Semantic segmentation - All pixels that are part of the same object type get assigned to the same segment
Instance segmentation - All pixels that are part of the same individual object are assigned to the same segment
Colour segmentation - Simply assign pixels to the same segment if they have a similar colour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 Types of Clusters

A

Centre-based clusters (prototype-based) - K-Means
Density-Based clusters - DBSCAN
Hierarchical-based clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 Types of Clustering

A

Partitional clustering - Non-overlapping subsets; Unnested
Hierarchical clustering - Organised as a hierarchical tree; Nested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 Clustering Algorithms

A

K-means Clustering
Density-based Clustering
Hierarchical Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

K-means Clustering

A

A prototype-based, partitional clustering method that seeks to identify a user-specified number of clusters (K) represented by their centroids.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

4 Step of K-means Clustering

A

Initialization: Choose the number of clusters K and randomly initialise K cluster centroids.

Assign each data point to the nearest centroid based on the Euclidean distance between the point and centroid.

Update centroids: Compute the mean of all data points assigned to each cluster and move the centroid to the mean. - Updates the location of each cluster’s centroid.

Repeat steps 2 and 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 Types of Clustering in K-Means

A

Hard Clustering - Assign each instance to a single cluster
Soft Clustering - Give each instance a score per cluster (score can be the distance between the instance and the centroid)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 Approach to Mitigate Risk of Converging to Local Optimum

A

Provide the initial centroids manually
Run the algorithm many times with various random initializations and retain the best result.
Use K-means++

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

3 Limitations of K-means

A

Sizes - Cannot different size
Densities - Need high density
Non-globular shapes - Only able globular shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 Term of DBSCAN

A

Core point - At least a specified number of points (MinPts) within Eps
Border point - Not a core point, but neighbourhood of a core point
Noise point - Neither core point or border point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5 Step of DBSCAN

A

Label all points as core, border, or noise points.

Eliminate noise points.

Put an edge between all core points within a distance Eps of each other.

Make each group of connected core points into a separate cluster.

Assign each border point to one of the clusters of its associated core points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 Type of Hierarchical Clustering

A

Agglomerative - Many to one
Divisive - One to many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

4 Way to Define Inter-Cluster Distance

A

MIN
MAX
Group Average
Distance between centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 Type of Unsupervised Measures

A

Cluster Cohesion (compactness): Measures how closely related are objects in a cluster. E.g. SSE

Cluster Separation: Measure how distinct or well-separated a cluster is from other clusters. E.g. Square Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly