L9 Cluster Analysis Flashcards

Question 1

Q

What is the main difference between factor and cluster analysis?

Answer

A

In the factor analysis we want to find factors of items.
In the cluster analysis we want to find clusters of objects.

Question 2

Q

What is the goal of the cluster analysis?

Answer

A

Find clusters such that within a cluster the objects are as similar as possible (internal homogeneity) while at the same time the clusters are as distinct as possible (external heterogeneity)

Question 3

Q

Two ways of qunatifying similarity

Answer

A

Distance and correlation

Question 4

Q

Two types of distance measure

Answer

A

Euclidean & city-block distance

–> depends on the case which one to use. (On what you define as similar)

Question 5

Q

What is agglomerative hierarchical clustering?

Answer

A

When you create larger and less clusters

Question 6

Q

What is divisive hierarchical clustering?

Answer

A

When you create more and more clusters out of large ones

Question 7

Q

What is the procedure for derving clusters? In the agglomerative hierarchical approach

Answer

A

Starting point: calculate pairwise similarity between objects (based on distance or correlation)
Step1: Merge those objects with highest similarity (P and Q) into a cluster
Step2: Calculate linkage criterion for the new cluster and the other objects (or clusters)
Step3: Merge those objects and cluster that minimize the linkage criterion
Then repeat steps 2&3 until there is a single cluster

Question 8

Q

What does the coefficient measure in the graph?

Answer

A

It measures the heterogeity index. Heterogeneity increases with a larger coefficient.

Question 9

Q

What are three linkage methods?

Answer

A

Single linkage
Complete linkage
Ward’s method

Question 10

Q

What is single linkage?

Answer

A

You find the nearest neighbor

Question 11

Q

What is complete linkage?

Answer

A

You find the farthest neighbor

Question 12

Q

What is Ward’s method?

Answer

A

Minimize total distance (variance) within a considered cluster.
Most reliable method.
You create a centroid which is the mean value of the hypothetical cluster

Question 13

Q

What are 3 indices for model evaluation? (About how many clusters to retain)

Answer

A

Within-cluster sum of squares (WSS)
Information criteria:
Bayesian information criterion (BIC)
Akaike information criterion (AIC)

Question 14

Q

How can you minimize within cluster sum of squares?

Answer

A

The smallest WSS is always the max # of clusters available

Question 15

Q

What is the advantage of the Information criteria over WSS?

Answer

A

The BIC and AIC are a trade off between model fit and model complexity

Question 16

Q

Why might the scree plot not be suitable in determining the optimal # clusters?

Answer

Study These Flashcards

A

There might not be a clear elbow. Use the Lowest BIC instead.

Question 17

Q

What does the Silhouette coefficient tell you?

Answer

Study These Flashcards

A

Expresses how clearly the clusters are separated. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation).

Question 18

Q

From what till what does the Silhouette coefficient range? What does a high score mean?

Answer

Study These Flashcards

A

Ranges from -1 to 1 (a high positive value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters)

Question 19

Q

What if many points have a low or negative silhouette coefficient?

Answer

Study These Flashcards

A

Then the clustering configuration may have too many or too few clusters.

L9 Cluster Analysis Flashcards

(19 cards)