Unsupervised Learning Flashcards
Too much data?
* Reduce it
- by number of examples – clustering
- by number of dimensions – dimensionality reduction
Clustering:
Find meaningful groupings in data
Topic modeling:
Discover topics/groups with which we can tag data points
Dimensionality reduction/compression:
Compress data set by removing redundancy and retaining only useful information
K-means
Unsupervised method for clustering
* Partition of data into k clusters based on features (k must be given)
* A cluster is represented by its centroid
* Each point is assigned to its nearest centroid/cluster
K-means Goal
To minimize intra-cluster distance – measure of distance between data and the correspondent cluster centroid
K-means: Iterative
Pick k random points as centroids
Repeat until no further reassignment:
- Assign points to its nearest centroid
- Change each cluster centroid to the average of points assigned to it
A good clustering is one that achieves
- High within-cluster similarity
- Low inter-cluster similarity
K-means Clustering fast and working well if. . . clusters:
- are spherically shaped
- are well separated
- have similar volumes
- a similar number of points (densities)
K-means Issues
clusters of very different sizes
clusters with different densities
non spherical clusters
Expectation Maximization (EM)
Algorithm
- Generate k random probability distributions [1]
Iterate until convergence: - Find which points are more probable to be generated by each probability distribution (E-step) [2]
- Re-calculate distributions parameters (M-step) [3]
EM Algorithm
Learning mixtures of Gaussians
Hierarchical Unsupervised
Learning
Find successive clusters using previously established ones (organized as an hierarchical tree)
- Agglomerative (bottom-up)
- Divisive (top-down)
Non Parametric Clustering –
DBSCAN
- Find the points in the ε-distance
neighbourhood of every point, and
identify the core points with more
than the minimum number of
points to form a dense region - Find the connected components of
the core points on the neighbours
graph, ignoring all non-core points - Assign each non-core point to a
nearby cluster if the cluster is an
ε-distance neighbour, otherwise
assign it to noise
SOM (Self-organizing Maps)
Similar to K-means, but . . .
* Representatives are organized (have neighbours) in a 1D or 2D mesh
* When a representative is selected to be adapted, neighbours are also updated (with a lower rate)