Unsupervised Learning Flashcards
What is clustering?
This is the task of grouping objects, objects in the same group are more similar.
What are some things to consider when clustering?
- The distance (similarity measure) used
- The Number of Clusters
- The Clustering Algorithm
- Evaluate the Clustering Output
What is the Single Link Measure?
The smallest distance between a point one cluster and a point in another cluster
What is the Complete Link Measure?
The maximum distance between a point in one cluster and the a point in another cluster
What is the Average Link Measure?
The average distance between points in two different clusters
What are some issues with K-Means Clustering?
- Need to specify the number of clusters in advance
- Different initialisation of the cluster centers give different solutions
- Unable to handle noisy data and outliers
- Not suitable for complex data patterns
What are the steps involved in K Means Clustering?
1.Compute the distance between each point and each data centre
- Assign each point to the data centre that it is closest to
- Move each data centre to the average of all the points that have been assigned to it
- Repeat until convergence
Will K Mean Clustering always converge?
Yes
What is one advantage of hierarchical clustering?
We do not have to specify the number of groups in advance
What are the two types of hierarchical clustering?
- Agglomerative
- Divisive
What is Agglomerative clustering?
Treat each data point as a cluster
Merge the atomic clusters into larger and larger clusters
What is Divisive Clustering?
Treat all data points as one single cluster
Then divide this cluster into smaller and smaller clusters
What are the steps of Agglomerative Clustering?
- Assume each point is its own cluster
- Compute the distances (single-link, complete-link, average-link) between each cluster
- Merge the two closest clusters
- Repeat Steps 2 and 3 until we have one single cluster
- Create Dendogram to show clusters at each iteration
What is the lifetime of a cluster?
The distance at which the cluster was created
MINUS
The distance at which it was merged
What are some drawbacks of Agglomerative Clustering?
- Sensitive to cluster distance measure
- Sensitive to noise
- Less efficient