Cluster Analysis Flashcards
What type of learning is cluster analysis?
Unsupervised
What does cluster analysis do?
Group data points into clusters based on certain characteristics
Along the lines of maximie and minmize, what does cluster analysis do?
Maximize the similarity within a cluster and minmiize similarity between clusters
What is intra-clusters?
Distances are minimized
What is inter-cluster?
Distances are maximizes
What are the two types of clustering methods?
Partitional clustering and hierarchical clustering
What are the 2 methods of partional clustering?
K means and k medoids
What are 2 methods of hierarchical clustering?
Agglomerative (bottom up) and divisive (top-down)
What are the 4 types of clusters?
- Center based
- Contiguous
- Density-based
- Conceptual
What is center based clusters?
Defining a cluster by its centroids
What are the two ways to define a cluster by its centroid?
- either the average of all the points in the cluster or
- medoid, the most representative point in the cluster.
What is contiguous?
Hierarchical clustering, based on the proximity of data points
What is density based clustering? And an example
DBSCAN, wher eit identifies clusters as dense regions of points
What is conceptual clustering? And an example
Latent class analysis, where clusters are formed based on shared underlying concepts or models
What is the formula that expresses similarity under clustering?
d(x,y)
What is the most popular similarity measure in clusters?
Euclidean distance
When does the euclidean distance work well with?
Numeric, continuous data in spherical clusters
Name three more distance measures good for cluster similarity?
- Correlation
- Cosine similarity
- Manhattan distance
What is the correlation distance measure?
Based similarity
What is the cosine similarity distance measure?
For high-dimensional text data or sparse vectors