W8: Cluster Analysis Flashcards

1
Q

Cluster analysis can be used to classify

A

individuals and separate them for further study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In cluster analysis we find groups of (2)

A

similar individuals based on their covariate information
These groups are known as clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Aim to extract a small number of cluster of individuals who share similar characteristics and who have

A

different characteristics than those in other clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we measure the degree of similarity between individual’s scores across a number of variables?

A

Using 2 measures: similarity coefficients and dissimilarity coefificents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The correlation coefficient ,r, is a measure of

A

similarlity between 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the Pearson’s correlation correlation tell us

A

whether 1 variable changes the other by a similar amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We could use the Pearson’s correlation coefficient, r, to work out the correlation between 2

A

individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We could use the Pearson’s correlation coefficient, r, to work out the correlation between 2

A

individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

However, although the correlation tells us whether the pattern of responses between people are similar

It does not tell us anything about the

A

distance between 2 individual profiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

An alternative measure compared to Pearson’s correlation coefficient , r, is

A

Euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Euclidean distance?

A

Geometric distance between 2 individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Euclidean distance for individual i and j formula given below:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

With Euclidean distance, the smaller the distance

A

the more similar the individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Euclidean distances are heavily affected by

A

variables with large size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

So if cases are being compared across variables that have different variances, then

A

Euclidean distances will be inaccurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In such case that Euclidean distances will be inaccurate if cases compared have different variances then (2)

A

May standardise the scores by subtracting the mean of each variable and dividng by SD

(value - mean)/SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to calculate SD on your calculator? (8)

A

Mode
Statistics (2)
1
Input values
OPTN
1 variable (3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Most methods of grouping individuals based on similarlity are done in hierarchical way (2)

A
  1. Begin all individuals treated as one cluster
  2. At each subsequent stage clusters merged based on Euclidean distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For example calculating Eucliden distances we first say

A

We calculate the Eucliden distance between the individuals

20
Q

In Eucliden distance, n represents

A

number of variables

21
Q

What is n here in calculating Euclidean distance?

A

4

22
Q

For first individuals lets say A and B we say in the formula before calculating:

A

For A and B:

23
Q

At the end of Euclidean distance
lets say A and B = 3.26
A and C = 2.13
B and C = 1.23

Ending sentence:

A

So we would cluster B and C first, then A and C and finally A and B

24
Q

Consider 3 ways of merging clusrers at each step of method (3)

A
  1. Nearest neighbour
  2. Furthest neighbour
  3. Average linkage
25
Q

What is nearest neighbour?

A

add individuals to clusters one at a time based
on the lowest Euclidean distance to any member of the cluster.

26
Q

What is furthest neighbour?

A

add individuals to clusters one at a time based
on the lowest Euclidean distances to all members of the cluster.

27
Q

What is average linkage?

A

add individuals to clusters one at a time based on
the lowest average Euclidean distance to the cluster.

28
Q

What is a dendogram?

A

It shows how and when individuals are combined in the clustering algorithm

29
Q

Interpret this denodgram that used nearest neighbour clustering (5)

A

We see clusters forming:

First one is between individuals 1,4,11,7 and 13
Second one is between individuals 10,12,9,15 and 2
Third one is between 6,8,5,3 and 14

First and second clusters are most closely linked than the third cluster

30
Q

In this table, interpret it:
Measured trait anxiety, depression, intrusive thoughts, impulsive
Patients with same disorder should report similar pattern across scores
We asked 2 psychologist to agree on diagnosis of GAD, depression or OCD

A

We see an exact mapping between the 3 clusters and the diagnosis of the psychologists in spite of clustering algorithm having no knowledge of the diagnoses

31
Q

Interpret the FN as compared to NN

A

We end up with same 3 clusters although individuals are combined in different order

32
Q

Which of the methods provides a more useful dendogram? Justify
your answer (2)

A

The furthest neighbour clustering provides a more useful dendogram.

It forms a more natural set of clusters than the nearest neighbour
clustering algorithm, which produces lots of clusters of size 1.

33
Q

Using dendogram you selected
(a) what distance should we cut the dendogram?
(B) What is the memberships of the resulting clusters? - (5)

A

Using the dendogram from the furthest neighbour clustering, we
would cut the dendogram somwhere between a distance of around
15 and 16.

The resulting cluster memberships would be

Cluster a: Ernie, Carla, Christa, Ernest, Christopher, Beulah, Linette, Marie, Bo.

Cluster b: Tony, Martina, Randolph, Raul, Catalina, Louis, Sunila,
Johnson, Mickey.

Cluster c: Rosalyn, Lawrence

34
Q

We have collected data on the salary, rank (from 1 to 5, where 5 is most
senior), FTE (hours worked, where 1 is full time), number of articles
published and years of experience of 20 university staff.

(c) Discuss cluter memberships in list of summary statistics and plots (4)

A

From the boxplots and classification table we can see that:

Cluster 1 is made up of individuals with a high salary, a large
number of articles and many years of experience, all of whom are
professors. It appears that cluster 1 is the most senior academics.
This correspondes to cluster c above.

Cluster 2 is made up of individuals with a lower salary, fewer articles and less experience than cluster 1. However, most are still professors. Therefore, cluster 2 is made up of a slightly lower level
of senior academics.

Cluster 3 is made up of individuals who have relatively low salaries,
few articles and less experience, and who are not yet full professors.
That is, cluster 3 is made up of early career academics.

35
Q

In SPSS what does this table tell us?

A
  • This table tells us which cluster each of my individuals belong to in the cluster membership table
36
Q

In SPSS how many clusters there are in this dendogram?

A

3 clusters

37
Q

Clusters in R Dendogram how may clusters?

A

3 clusters

38
Q

Since variables are on the same scale there is no need to

A

standardise them

39
Q

Interpret this dendogram - (2) dendogram shows different aspect of disgust - FN

A

Three broad clusters appear, which seem to be distinct types of disgust.
We may want to further disaggregate into up to seven clusters.

40
Q

Interpret this dendogram - (2) dendogram shows different aspect of disgust - NN

A

Clustering of bread, deer, crisp and foxes, but other clustering is hard to distinguish.

41
Q

Which of the two clustering methods used appears to provide the more interpretable
clusters? from different aspect of disgust?

A

Furthest Neighbour

42
Q

Data on 7 different measurements of 41 cities

(c) What do you observe from NN dendogram?

A

The clustering is a little tricky to make out, although Chicago appears different to the other cities.

43
Q

Now perform furthest neighbour clustering on the data. What conclusions can you
make this time? Are they different to the nearest neighbour algorithm?

A

This time, two/three large clusters appear, plus a cluster with just Chicago.

44
Q

Based on the furthest neighbour clustering, find the cluster membership for a 3
cluster solution. Are Wichita and St Louis in the same cluster? Which city is the
“odd one out”?

A

Wichita is in cluster 1 and St Louis in cluster 2. Chicago is the odd one out.

45
Q

Interpret this dendogram - is it similar or different to previous NN/FN

A

The results are different again. Chicago is clustered with Philadelphia this time.