17-Unsupervised learning Flashcards
What is unsupervised learning?
Unsupervised learning is a group of machine learning models where the class is unknown
What is the difference in exclusive or overlapping clustering?
Exclusive clustering says that an item can only be in one category, whereas overlapping clustering suggests an item can be in more than one category
What is the difference between deterministic and probabilistic clustering?
Deterministic clustering means that an item can be in one cluster. Whereas probabilistic clustering assigns a probability to each item
What is the difference between hierarchical and partitioning clustering?
Hierarchical clustering suggests clusters have subset relationships
What is the difference between heterogenous and homogenous clustering?
Heterogenous clustering have clusters of different shapes and sizes, whereas homogenous clustering have clusters of one shape
What is the difference between partial vs complete clustering?
Partial clustering only clusters some of the data
What is the difference between incremental vs batch clustering?
In batch clustering, items are clustered at the same time.
How does k-means algorithm work?
Initialise k random seed points
Assign each instance to the cluster with the nearest centroid.
Update centroid and assign to the average of the nearest centroid
Stop until centroids don’t change
What are the pros of k-means?
Relatively efficient
Can be extended to hierarchical clustering
What are the cons of k-means?
Sensitive to random centroid selection
Mean not well defined for nominal / ordinal attributes
May not work well with outliers
May not be able to handle clusters of different sizes
Need to classify k in advance
How should the k for k-means be calculated?
Calculate within-cluster SSE from centroid for each cluster. As k increases within-cluster SSE decreases. Use elbow method
What are the two hierarchical clustering methods?
Agglomerative and divisive
What is agglomerative hierarchical clustering?
Bottom up clustering - start with single instance clusters and join two closest clusters
What is divisive hierarchical clustering?
Top-down clustering - start with one universal cluster, find two partitioning clusters and proceed recursively
What are the graph-based measures of proximity?
Minimum - two nearest single points
Complete - two furthest points
Average - Average distance between all points