lecture 11 Clustering Flashcards
k means
Convex in space
Cluster boundaries are In the middle of centers
Can’t model covariants
Only simple cluster shapes
——
Naive implementation
Fast
Not good for large dataset
Agglomeratie clustering
Start with all point in their own-> 2-> 3 -> hierarchical
Dendograms
Merging criteria: complete/ average / single/ward( smallest increase)
——————
Restrict to input topology
Fase with sparse connectivity
May link to imbalanced
Give more holistic view
Dbscan
Core
Allows complex cluster shapes
Can detect outliers
Two parameters to adjust
Learn arbitrary cluster shapes
Mixture models
Data is mixture of small number of known distributions
Find p(x)
Guasisian
Non-convex
Parametric density
How likely a new point is
Bayesian infinite mixture
Add priors on mixture coefficients and Gaussian
Can I unselect components if they don’t contribute
Possibly more robust
Replace mixture coefficients by dirichelet process
Automatically finds number of components