W8: Cluster Analysis Flashcards
Cluster analysis can be used to classify
individuals and separate them for further study
In cluster analysis we find groups of (2)
similar individuals based on their covariate information
These groups are known as clusters
Aim to extract a small number of cluster of individuals who share similar characteristics and who have
different characteristics than those in other clusters
How can we measure the degree of similarity between individual’s scores across a number of variables?
Using 2 measures: similarity coefficients and dissimilarity coefificents
The correlation coefficient ,r, is a measure of
similarlity between 2 variables
What does the Pearson’s correlation correlation tell us
whether 1 variable changes the other by a similar amount
We could use the Pearson’s correlation coefficient, r, to work out the correlation between 2
individuals
We could use the Pearson’s correlation coefficient, r, to work out the correlation between 2
individuals
However, although the correlation tells us whether the pattern of responses between people are similar
It does not tell us anything about the
distance between 2 individual profiles
An alternative measure compared to Pearson’s correlation coefficient , r, is
Euclidean distance
What is Euclidean distance?
Geometric distance between 2 individuals
The Euclidean distance for individual i and j formula given below:
With Euclidean distance, the smaller the distance
the more similar the individuals
Euclidean distances are heavily affected by
variables with large size
So if cases are being compared across variables that have different variances, then
Euclidean distances will be inaccurate
In such case that Euclidean distances will be inaccurate if cases compared have different variances then (2)
May standardise the scores by subtracting the mean of each variable and dividng by SD
(value - mean)/SD
How to calculate SD on your calculator? (8)
Mode
Statistics (2)
1
Input values
OPTN
1 variable (3)
Most methods of grouping individuals based on similarlity are done in hierarchical way (2)
- Begin all individuals treated as one cluster
- At each subsequent stage clusters merged based on Euclidean distance