Cluster Analysis Flashcards

1
Q

is finding groups of objects such that the objects in a group will be similar (or related) to one another and different from the object in other groups.

A

Cluster Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

2 TYPES OF CLUSTERING

A
  1. Partitional Clustering
  2. Hierarchical Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 Types of Clusters

A
  1. Center-Based Clusters
  2. Contiguity Clusters
  3. Density-Based Clusters
  4. Conceptual Clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

a type of cluster that is a set of objects such that an object in a cluster is closer to the “center” of a cluster, than to the center of any other cluster.

A

Center-Based Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the center of a cluster.

A

Centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

average of all points in the cluster.

A

Medoid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

a type of cluster where each point is closer to at least one point in its cluster than to any point in any other cluster.

A

Contiguity Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

a type of cluster where
the cluster are regions of high density separated by regions of low density.

A

Density-Based Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a type of cluster where points in a cluster share some general property that derives from the entire set points.

A

Conceptual Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 Objective Functions

A
  1. Global Objective Function
  2. Local Objective Function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

typically used in partitional clustering.

A

Global Objective Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 Clustering Algorithms

A
  1. K-Means Clustering
  2. Hierarchical Clustering
  3. Density-Based Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

is a partitional clustering approach.

A

K-Means Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

is the mean of the points in a cluster.

A

Centroid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

is used to measure “closeness”

A

Euclidean Distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

will converge typically in the first few iterations.

17
Q

3 Solution to Initial Centroid Problem: (randomly chosen centroids)

A
  1. Multiple Runs
  2. Sample and use hierarchical clustering to determine the initial centroids.
  3. Select more than k initial centroid, and the select among those the one that are far away from each other.
18
Q

is the most common measure in evaluation K-means clusters.

A

Sum of Squared Error (SSE)

19
Q

2 Pre-Processing Methods for K-Means Clusters:

A
  1. Normalize the data
  2. Eliminate outliers.
20
Q

3 Post-Processing Methods for K-Means Clusters:

A
  1. Eliminate small clusters that may represent outliers.
  2. Split ‘loose’ clusters, clusters with high SSE
  3. Merge clusters that are ‘close’ and that have relatively low SSE
21
Q

2 Limitations of K-Means Clusters:

A

o It has problem when clusters are of differing sizes, density, and non-globular shape.

o When data contains outliers.

22
Q

5 Different Aspects of Cluster Validation:

A
  1. Determining the clustering tendency of a set of data.
  2. External Validation
  3. Internal Validation
  4. Compare Clustering
  5. Determining the ‘correct’ number of clusters.
23
Q

compare the result of a cluster analysis to externally known class labels.

A

External Validation

24
Q

evaluating how well the results of a cluster analysis fit the data without reference to external information.

A

Internal Validation

25
to determine which is better.
Compare Clustering
26
3 Measures of Cluster Validity
1. External Index 2. Internal Index 3. Relative Index
27
used to measure the extent to which clusters label match externally supplied class labels.
External Index
28
used to measure the goodness of a clustering structure without respect to external information.
Internal Index
29
used to compare two different clustering or clusters.
Relative Index
30
2 Internal Measures
o Cluster Cohesion o Cluster Separation
31
measures how closely related objects in a cluster are.
Cluster Cohesion
32
measures how distinct or well-separated a cluster is from other clusters.
Cluster Separation