cluster analysis Flashcards

1
Q

what is cluster analysis

A

It allows us to simplify a mass of individual cases into fewer groups, or ‘clusters’ based on putting the most similar cases together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why is cluster analysis useful

A

we can then explore the characteristics of a cluster and explore the relationships between clusters and variables we might be interested in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

cluster analysis is an analysis of ___

A

interdependence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is an individual variable in cluster analysis called

A

a case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in cluster analysis what do you want to do between clusters and within clusters?

A

increase similarity between clusters and increase dissimilarity between clusters as much as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the process of putting certain cases together based on their similarities called

A

pairing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are the names of the different statistical distances (how similar or dissimilar cases are) - 3

A

euclidean distances
manhattan distance
pearson distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is euclidean distance?

A

square root of the sum of the squared difference between each score for two observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the most commonly used statistical distance measure

A

euclidean distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is manhattan distance

A

along the corridor and up the stairs. sum of the absolute distance between score in observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is an advantage of manhattan distance

A

reduces the influence of outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is pearsons distance

A

square root of the sum of squared difference between observations divided by their variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is an advantage of pearson distance

A

good for observing data that has differences in scale (different magnitude ranges)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is single linkage (nearest neighbor)

A

take the 2 cases that are closest together in distance, then find the next closest, etc. creating a small number of meaningful clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is complete linkage (furthest number)

A

the distance between the two clusters is based on the longest distance between any two members in the two clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is average linkage

A

The distance between the two clusters is defined as the average distance between all pairs of the two clusters’ members

17
Q

what is centroid linkage

A

the centre of each cluster is computed first. the distance between each cluster is then the distance between each centre of the cluster

18
Q

can a cluster move once it has been joined

A

no

19
Q

when are clusters specified in non heirarchal clustering

A

specified in advance

20
Q

within non hierarchical clustering do you know about the factors beforehand?

A

yes - know and understand the factors

21
Q

in non hierarchical clustering clusters are __-

A

fluid - can move up until the end of the process

22
Q

how does hierarchical clustering work

A

clusters are combined sequentially until one cluster is left (each case is a seperate cluster)

23
Q

in hierarchical clustering, clusters are ___

A

static (once a case is joined it does not move)

24
Q

wards method is associated with which form of clustering

A

hierarchical clustering

25
Q

k-means clustering is associated with which form of clustering

A

non-hierarchical

26
Q

what is ward’s method

A

uses ANOVA to evaluate the distances between clusters. looking for similarity and dissimilarity between cases and clusters

27
Q

what is k means clustering

A

an approach that produces clusters with the greatest possible distinction between clusters

28
Q

give an explanation of how wards method and k means clustering can be used in succession

A

Ward’s method to get a sense of the possible number of clusters and then k-means clustering with the optimum number of clusters used to place all the cases in those clusters.

29
Q

in a dendrogram 3 lines is equal to ___

A

3 clusters

30
Q

what method of clustering does a dendrogram start with

A

hierarchical

31
Q

what is an agglomeration schedule

A

a numerical version of a dendrogram

32
Q

what are the limitations of cluster analysis (PROAM)

A

Presence of groups does not mean they are meaningful
how many clusters to Retain
Outliers can produce clustering issues
Absence of groups does not mean they don’t exist
can have no Missing data as need to calculate distances for all cases