cluster analysis Flashcards
what is cluster analysis
It allows us to simplify a mass of individual cases into fewer groups, or ‘clusters’ based on putting the most similar cases together.
why is cluster analysis useful
we can then explore the characteristics of a cluster and explore the relationships between clusters and variables we might be interested in.
cluster analysis is an analysis of ___
interdependence
what is an individual variable in cluster analysis called
a case
in cluster analysis what do you want to do between clusters and within clusters?
increase similarity between clusters and increase dissimilarity between clusters as much as possible
what is the process of putting certain cases together based on their similarities called
pairing
what are the names of the different statistical distances (how similar or dissimilar cases are) - 3
euclidean distances
manhattan distance
pearson distance
what is euclidean distance?
square root of the sum of the squared difference between each score for two observations
what is the most commonly used statistical distance measure
euclidean distance
what is manhattan distance
along the corridor and up the stairs. sum of the absolute distance between score in observations
what is an advantage of manhattan distance
reduces the influence of outliers
what is pearsons distance
square root of the sum of squared difference between observations divided by their variance.
what is an advantage of pearson distance
good for observing data that has differences in scale (different magnitude ranges)
what is single linkage (nearest neighbor)
take the 2 cases that are closest together in distance, then find the next closest, etc. creating a small number of meaningful clusters
what is complete linkage (furthest number)
the distance between the two clusters is based on the longest distance between any two members in the two clusters