Clustering Flashcards

Question

What is the main disadvatage of hierrarchical methods?

Answer 1

once merge or split is done it cannot be undone therefore cannot correct erroneous desicions

Answer 2

continues growing clusters as long as density in the neighborhood exheeds a certain threshold

Answer 3

detecting outliers and to discover clusters of arbitrary shape

Answer 4

quantize the object space into a finite number of cells that form a grid structure

Answer 5

fast processing time and possible integratino with other clustering methods such as density based methods and hierarchical methods

Answer 6

assesing cluster tendency, determine number of clusters in a dataset, measuring the quality of the clustering

Answer 7

determines if there is a non-random structure which may lead to meaningful clusters.

Answer 8

non uniform distribution of data

Answer 9

spatial statistic that tests the spatial randomness of a variable as distributed in a space

Answer 10

that D is uniformly (not meaningufl) or non uniformily (meaningful clusters) respectiveley distributed

Answer 11

1 - INPUT parameter for some algorithms | 2 - controls proper granularity of cluster analysis

Answer 12

distribution shape and scale in the data set as well as clustering resolution needed by the user

Answer 13

sqrt(n/2) where n is the # of objects, it would have sqrt(2n) points

Answer 14

observation that increasing the number of clusters helps reduce the sum of within-cluster variance of each cluster

Answer 15

Selecting the turning point in the curve of the sum of within-cluster variance with respect to number of clusters

Answer 16

the effect of reducing the sum of within cluster variance may drop

Answer 17

cross validation

Answer 18

building cluster with n-1 dataset objects and using remaining to test quality of clustering by calculating within-clustering variance of test points to centroids.

Answer 19

Extrinsic method, Intrinsic method

Answer 20

comparing clustering against the group truth and measure

Answer 21

eavluate goodnes of clustering by considering how well the clusters are separated

Answer 22

Cluster homogeneity, cluster completeness, rag bag, and small cluster preservation

Answer 23

That the more pure the clusters in a clustering are the better (using ground truth)

Answer 24

requires that a clustering should assign objects belonging to the same category (according to ground truth) to the same cluster

Answer 25

splitting a small category into pieces is more harmful than splitting a large category into pieces.

Answer 26

BCubed precision and recall

Answer 27

the precisionand recall for every object in a clustering on a given data set according to ground truth

Answer 28

how many other objects in the same cluster belong to the same category as the object

Answer 29

The recall of an object reflects how many objects of the same category are assigned to the same cluster

Answer 30

a(o) is the average distance between o and all other objects in the cluster to which o belongs b(o) is the minimum average distance from o to all clusters to which o does not belong

Answer 31

silhouette coefficient and ranges from -1 and 1 where a negative value means the point is closer to a point in another cluster and positive means it is compact (good)

Answer 32

classification groups data points with respect to a target and clustering with respect to a similarity metric

Answer 33

existing feature included in the clustering, existing feature not included (target), latent attribute you dont have acccess to

Answer 34

sum of squared error between data points ant respective cluster center

Answer 35

1. randomly chooses k data points to serve as initial centroids 2. runs until a max iteration or when there is no change in cluster asignments 3. returns cluster membership of all data points

Answer 36

choos a range of k, run for every k, calculate SSE, calculate change in slope between consecutive sums, choose the k where the largest difference in slope is calculated

Clustering Flashcards

(60 cards)