Cluster Analysis Flashcards
What does cluster analysis involve?
converting a number of observations in to fewer groups of observations
What are the 3 pairing methods for creating clusters?
Euclidean = shortest distance between points
Manhattan = sum of absolute difference between two scores
Pearson
What is the main objective for pairing methods?
Maximise similarity within clusters whilst also maximising dissimilarity between clusters
What are the 4 linkage procedures?
Single - combining clusters based on most similar points
Complete - combined based on most different members of two clusters
Centroid - combining based on the central value within the clusters
Average - based on average linkage distance between all cases
What are the two structures for clustering?
Hierarchical = combining clusters sequentially from most to least similar
Non-hierarchical - informed by prior knowledge we combine them in a specific way
What are dendrogams?
Visual ways of clustering they show the distances/processes that went in to clustering
When looking to denote clusters using dendrograms what are we using?
Looking for the longest lines as that outlines where a big jump in the data was made to combine those clusters and therefore suggests that they were not very similar
What can assist a dendrogram?
Agglomeration schedule
What is an agglomeration schedule?
It contains the coefficients of the clusters
What are 5 key considerations for when conducting a cluster analysis?
- logic/subjectivity of a cluster - realistic?
- Have missing or suitable data?
- Awareness of outliers and skewness impact?
- Clusters produced may not be all clusters that exist in real world
- Clusters in the real world may not have been identified by spss clustering process