Cluster Analysis Flashcards

1
Q

Cluster Analysis is …

A

Grouping observations based on their key characteristics so that they are also different to observations in other clusters; identifying natural groups within the data with aim to analyze groups instead of individual values (data reduction)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Assumptions

A

Representativeness of the sample, no large multicollinearity, no outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To limit multicollinearity …

A

Scaling the numbers, use distance measures, exclude highly correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A similarity can be measured by …

A

Distance measures (Minkowski like Euclidean, Mahalanobis), correlation coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Distance measure measures …

A

Dissimilarity between two objects, large value means they are not similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hierarchical cluster technique means …

A

The final number of clusters is not fixed - agglomerative, divisive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Agglomerative clustering means …

A

Starts with every object being in own cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Divisive clustering means …

A

Starts with one cluster, ends with single clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Single linkage method is …

A

Good to detect outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Complete linkage methods is …

A

Sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Average linkage method is …

A

Considers avg similarity of all individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Centroid linkage method is …

A

Consider differences between centroids

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ward’s method

A

Uses variance within clusters, good when equally sized clusters are expected, sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Seed points are for …

A

creating clusters around them for when the amount of clusters is fixed - non-hierarchical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

k-means clustering …

A

Calculates the similarity between the seeds and the objects, then assigns the objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hierarchical Or Non-hierarchical Clustering

A

Hierarchical when a small sample size, not known how many clusters

17
Q

Why multicollinearity is not good

A

Since the variables under concern may have the same char, hence, there is a greater impact on the cluster solution compared to other char