Chapter 10 Flashcards
Cluster, potential class
a collection of data objects
similar to one another within the same group
dissimilar to the objects in other groups
cluster analysis, clustering, data sementation…
finding simiularites between data according to the characteristics found in the data and grouping similar data objects into clusters.
unsupervised learning
no predefined classes, i.e. learning by observarions vs. learning by examples, superives
a stand alone trool to get insight
preprocessing step for other algorithms
Examples of clusteriong
biology: animal kingdom class order economic science: market research
summarization
preprocessing for regression, PCA, classification, and association analysis
compression
image processing: vector quantization
finding k-nearest neighbors
localizing search to one or a small number of clusters
outlier detection
outliers are often viewed as those far away from any cluster
KNN
simplest model, k=1 closeest value, k=2 the two closest entries
distance calculation: euclidean distance, manhattan distance
challenge: high dimensional data 3d 4d
scalability
clustering all the data insread of onlt on samples which can lead to biased results
abiltiy to deal with different types of attributes
numerical, binary, categorical, ordinal, linked, and mixture of these
constraint based clustering
user may give inputs on constraints
use domain knowledge to determine input parameters
partitioning crietria
single level vs hierarchical partitioning
seperatrion of clusters
exclusive one customer belongs to one region vs non exlisov one document mauy belong to more than pone class
similarity measure
distance based bs connectivity based
clustering space
dull space vs subspaces
good partitioning
objects in the same cluster are close to related to each other whereas objects in different clusters are far apart or very diofferent.
typical methods
k means, kmedoid , work week for dinding spherical shaped slusters in samll to medium size databases
hierarchical apaproch
create a hierarchical decomposition of data objects
agglomerative bottom up approach
starts with each object forming a separate group successively merges into one or a ermination condition holds