Cluster Analysis Flashcards

Question 1

Q

Cluster Analysis is …

Answer

A

Grouping observations based on their key characteristics so that they are also different to observations in other clusters; identifying natural groups within the data with aim to analyze groups instead of individual values (data reduction)

Question 2

Q

Assumptions

Answer

A

Representativeness of the sample, no large multicollinearity, no outliers

Question 3

Q

To limit multicollinearity …

Answer

A

Scaling the numbers, use distance measures, exclude highly correlated variables

Question 4

Q

A similarity can be measured by …

Answer

A

Distance measures (Minkowski like Euclidean, Mahalanobis), correlation coefficients

Question 5

Q

Distance measure measures …

Answer

A

Dissimilarity between two objects, large value means they are not similar

Question 6

Q

Hierarchical cluster technique means …

Answer

A

The final number of clusters is not fixed - agglomerative, divisive

Question 7

Q

Agglomerative clustering means …

Answer

A

Starts with every object being in own cluster

Question 8

Q

Divisive clustering means …

Answer

A

Starts with one cluster, ends with single clusters

Question 9

Q

Single linkage method is …

Answer

A

Good to detect outliers

Question 10

Q

Complete linkage methods is …

Answer

A

Sensitive to outliers

Question 11

Q

Average linkage method is …

Answer

A

Considers avg similarity of all individuals

Question 12

Q

Centroid linkage method is …

Answer

A

Consider differences between centroids

Question 13

Q

Ward’s method

Answer

A

Uses variance within clusters, good when equally sized clusters are expected, sensitive to outliers

Question 14

Q

Seed points are for …

Answer

A

creating clusters around them for when the amount of clusters is fixed - non-hierarchical

Question 15

Q

k-means clustering …

Answer

A

Calculates the similarity between the seeds and the objects, then assigns the objects

Question 16

Q

Hierarchical Or Non-hierarchical Clustering

Answer

Study These Flashcards

A

Hierarchical when a small sample size, not known how many clusters

Question 17

Q

Why multicollinearity is not good

Answer

Study These Flashcards

A

Since the variables under concern may have the same char, hence, there is a greater impact on the cluster solution compared to other char

Cluster Analysis Flashcards

(17 cards)