Cluster Analysis Flashcards

1
Q

is finding groups of objects such that the objects in a group will be similar (or related) to one another and different from the object in other groups.

A

Cluster Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 TYPES OF CLUSTERING

A
  1. Partitional Clustering
  2. Hierarchical Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 Types of Clusters

A
  1. Center-Based Clusters
  2. Contiguity Clusters
  3. Density-Based Clusters
  4. Conceptual Clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a type of cluster that is a set of objects such that an object in a cluster is closer to the “center” of a cluster, than to the center of any other cluster.

A

Center-Based Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the center of a cluster.

A

Centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

average of all points in the cluster.

A

Medoid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

a type of cluster where each point is closer to at least one point in its cluster than to any point in any other cluster.

A

Contiguity Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

a type of cluster where
the cluster are regions of high density separated by regions of low density.

A

Density-Based Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a type of cluster where points in a cluster share some general property that derives from the entire set points.

A

Conceptual Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 Objective Functions

A
  1. Global Objective Function
  2. Local Objective Function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

typically used in partitional clustering.

A

Global Objective Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 Clustering Algorithms

A
  1. K-Means Clustering
  2. Hierarchical Clustering
  3. Density-Based Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is a partitional clustering approach.

A

K-Means Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

is the mean of the points in a cluster.

A

Centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

is used to measure “closeness”

A

Euclidean Distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

will converge typically in the first few iterations.

A

K-Means

17
Q

3 Solution to Initial Centroid Problem: (randomly chosen centroids)

A
  1. Multiple Runs
  2. Sample and use hierarchical clustering to determine the initial centroids.
  3. Select more than k initial centroid, and the select among those the one that are far away from each other.
18
Q

is the most common measure in evaluation K-means clusters.

A

Sum of Squared Error (SSE)

19
Q

2 Pre-Processing Methods for K-Means Clusters:

A
  1. Normalize the data
  2. Eliminate outliers.
20
Q

3 Post-Processing Methods for K-Means Clusters:

A
  1. Eliminate small clusters that may represent outliers.
  2. Split ‘loose’ clusters, clusters with high SSE
  3. Merge clusters that are ‘close’ and that have relatively low SSE
21
Q

2 Limitations of K-Means Clusters:

A

o It has problem when clusters are of differing sizes, density, and non-globular shape.

o When data contains outliers.

22
Q

5 Different Aspects of Cluster Validation:

A
  1. Determining the clustering tendency of a set of data.
  2. External Validation
  3. Internal Validation
  4. Compare Clustering
  5. Determining the ‘correct’ number of clusters.
23
Q

compare the result of a cluster analysis to externally known class labels.

A

External Validation

24
Q

evaluating how well the results of a cluster analysis fit the data without reference to external information.

A

Internal Validation

25
Q

to determine which is better.

A

Compare Clustering

26
Q

3 Measures of Cluster Validity

A
  1. External Index
  2. Internal Index
  3. Relative Index
27
Q

used to measure the extent to which clusters label match externally supplied class labels.

A

External Index

28
Q

used to measure the goodness of a clustering structure without respect to external information.

A

Internal Index

29
Q

used to compare two different clustering or clusters.

A

Relative Index

30
Q

2 Internal Measures

A

o Cluster Cohesion
o Cluster Separation

31
Q

measures how closely related objects in a cluster are.

A

Cluster Cohesion

32
Q

measures how distinct or well-separated a cluster is from other clusters.

A

Cluster Separation