L4 Cluster Analysis Flashcards
What does cluster analysis involve?
Transforming the number of cases we have identified from our investigation in to a number of clusters based on the similar characteristics between them
Give a human and physical geography application of cluster analysis
Physical - investigation in to plant composition in the Amazon Rainforest may lead to clustering certain plants based on similar characteristics
Human - commercial use of the ‘if you like this, you might like this’ approach. This has been based on a number of cases (people) that have bought that item and the other suggested one. They therefore exert similar behavioural characteristics that have been clustered together. The new customer displays similar characteristics so is invited to potentially join that cluster
What does the process of ‘pairing’ involve?
grouping and clustering together the different clusters
What is pairing between cases based upon?
Statistical distance between cases
What are the 3 methods of classifying statistical distance?
Euclidean = square root of the sum of squared differences between each score for two observations. Essentially a straight line distance between 2 individual cases
Manhattan = sum of absolute difference between score in observation.
Pearson =
What is Manhattan the best method for?
Handling outliers by reducing the impact they play upon the cluster
What is the Pearson method best for?
Handling large datasets
What are the 4 linkage procedures for making clusters?
- Single
- Complete
- Centroid
- Average
What does single linkage clustering involve?
Creating clusters based on the most similar points within two clusters
What does complete linkage clustering involve?
Creating clusters based on the most different cases within two clusters
What does centroid linkage clustering involve?
Putting clusters together based on the central point within each cluster
What does average linkage clustering involve?
The average linkage distance between all of the cases within a cluster
What are the two forms of a clustering process?
Hierarchical = combined in the order of most similarity to least similarity until just one cluster is left Non-hierarchical = the number of desired clusters are specified in advance (most likely due to prior knowledge) and so the process of clustering continues until 3 definable clusters are identified.
Describe the process of non-hierarchical clustering?
- specify the number of sought clusters
- clustering process begins in which the cases can continue to move between the different clusters based on changing degrees of similarity
- once the number of pre-defined clusters have been created, the cases are fixed in place
What is a dendrogram?
A graphical representation of the different stages that make up the clustering process.