Cluster Analysis Flashcards

1
Q

What does cluster analysis involve?

A

converting a number of observations in to fewer groups of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 pairing methods for creating clusters?

A

Euclidean = shortest distance between points
Manhattan = sum of absolute difference between two scores
Pearson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the main objective for pairing methods?

A

Maximise similarity within clusters whilst also maximising dissimilarity between clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 4 linkage procedures?

A

Single - combining clusters based on most similar points
Complete - combined based on most different members of two clusters
Centroid - combining based on the central value within the clusters
Average - based on average linkage distance between all cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two structures for clustering?

A

Hierarchical = combining clusters sequentially from most to least similar
Non-hierarchical - informed by prior knowledge we combine them in a specific way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are dendrogams?

A

Visual ways of clustering they show the distances/processes that went in to clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When looking to denote clusters using dendrograms what are we using?

A

Looking for the longest lines as that outlines where a big jump in the data was made to combine those clusters and therefore suggests that they were not very similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can assist a dendrogram?

A

Agglomeration schedule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an agglomeration schedule?

A

It contains the coefficients of the clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 5 key considerations for when conducting a cluster analysis?

A
  1. logic/subjectivity of a cluster - realistic?
  2. Have missing or suitable data?
  3. Awareness of outliers and skewness impact?
  4. Clusters produced may not be all clusters that exist in real world
  5. Clusters in the real world may not have been identified by spss clustering process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly