ML-07 - Unsupervised learning Flashcards
ML-07 - Unsupervised learning
What’s the requirement for unsupervised learning?
Your data needs some inherent structure.
ML-07 - Unsupervised learning
What are the two major types of unsupervised learning?
- Clustering
- Anomaly detection
ML-07 - Unsupervised learning
Name the techniques mentioned in the lecture slides. (5)
- K-Means Clustering
- Mean-Shift Clustering
- DBSCAN
- EM using GMM
- Agglomerative Hierarchical Clustering
ML-07 - Unsupervised learning
What is DBSCAN short for?
Density-Based Spatial Clustering Applications with Noise
ML-07 - Unsupervised learning
What is EM short for?
Expectation maximixation
ML-07 - Unsupervised learning
What is GMM short for?
Gaussian Mixture Models
ML-07 - Unsupervised learning
What is “EM using GMM” short for?
xpectation Maximization Clustering using Gaussian Mixture Models
ML-07 - Unsupervised learning
How do you perform k-means clustering?
1) Assign N centroids to your data at random.
2) Assign data point to nearest centroid.
3) Move centroid to average of data positions.
4) Repeat 2+3 until convergence.
ML-07 - Unsupervised learning
How do you select the number of clusters to use in k-means?
- Eyeball data (if few dimensions)
- Elbow method
- Analyze within-cluster variance
ML-07 - Unsupervised learning
Describe the elbow method.
1) Plot cost vs. n_clusters
2) Look for the “elbow”, i.e. where the cost decrease slows down.
(See image)
ML-07 - Unsupervised learning
When does the elbow method work the best?
When you have separated clusters.
(See image)
ML-07 - Unsupervised learning
When does the elbow method not work that well?
If you have non-separated clusters with uncertain boundaries. There might not be an elbow in the plot.
ML-07 - Unsupervised learning
What are the pros of using k-means?
Fast computation
ML-07 - Unsupervised learning
What are the drawbacks of k-means?
- Challenging to identify the groups/clusters.
- Random initialization means the results can lack consistency.
ML-07 - Unsupervised learning
What two classes of methods are used to evaluate clustering performance?
- Supervised
- Unsupervised