Cluster Analysis Flashcards

Question 1

Q

Cluster analysis is based on and involves

Answer

A

Cluster analysis is based on the concept of similarity. Groups are formed by the pairing of individual cases within a dataset according to how similar they are on either a series of two or more scales or measures.

Question 2

Q

Cluster analysis techniques

Answer

A

Allow the researcher to examine how cases in the dataset are related to each other across a range of variables

Question 3

Q

Cluster analysis - distance

Answer

A

Observations are related and paired to each other on the basis of distances, which are defined by the differences between the scores of one observation and corresponding scores of another. Distance (D) is based on the notion of n x v dimensional space, where D = (n, v). D = dimensional space. n = number of observations. v = number of variables.

Question 4

Q

Euclidean distance

Answer

A

Square root of the sum of squared difference between each score for two observations

Question 5

Q

Manhattan distance

Answer

A

Sum of the absolute difference between scores in observations

Question 6

Q

Pearson distance

Answer

A

Square root of the sum of squared difference between observations divided by their variance

Question 7

Q

Pairing process

Answer

A

Depending on the distance measure and linkage method adopted, a process of agglomeration will take place by adding observations to clusters until just one cluster remains that contains all individual observations. The pairing process continues until one ‘cluster’ has been forumlated from the cases in the dataset.

Question 8

Q

Single linkage

Answer

A

This is based on the minimum distance between an observation in one cluster and that of another.

Question 9

Q

Average linkage

Answer

A

This method examines not only the distances between two observations in different clusters but also the distance between the cluster centres.

Question 10

Q

Centroid linkage

Answer

A

This uses another avergaing technique that attempts to link clusters according to the cluster means.

Question 11

Q

Complete linkage

Answer

A

This offers a means by which to examine the maximum possible distance between an observation in one cluster and that of another. The clusters tend to be of a relatively similar size and uniformity. This can mean the outliers are more significant, pulling the maximum limits of any given cluster and skewing the result.

Question 12

Q

Hierarchical clustering

Answer

A

Each case is a seperate cluster, clusters are combined sequentially until one cluster is left or once a case is joined it does not move

Question 13

Q

Median linkage

Answer

A

Ensures that the median, and not the mean, distance between two clusters provides the distance measure.

Question 14

Q

Method to evaluate the distances between clusters

Answer

A

Most popular approach is Ward’s method that uses ANOVA to evaluate the distances.

Question 15

Q

Non hierarchical clustering

Answer

A

Clusters are specified in advanced and cases can move clusters right up until the end of the process.

Question 16

Q

Most popular non hierarchical clustering approach

Answer

Study These Flashcards

A

k-means clustering - produces clusters with the greatest possible distinction between clusters.

Question 17

Q

Dendrograms

Answer

Study These Flashcards

A

Diagrammatic representation of the pairing process indicating how many clusters existed at any particular part of the process. Every observation is represented as an individual, therefore clusters and individuals not found to be identical at the first stage are represented by vertical lines.

Question 18

Q

Cutting of Dendrograms

Answer

Study These Flashcards

A

Cutting is a subjective and somewhat difficult process as the number of clusters one wants to retain for further analysis depends on the research objectives. The deicision may also depend on whether the work is explanatory or confirmatory. Where there is a confirmatory element in the research, the dendrogram might be cut according to the relevant number of clusters sought.

Question 19

Q

Cautions/Limitations

Answer

Study These Flashcards

A

The subjectivity of how many clusters to retain, no missing data can be permitted, outliers or skewed data can produce problems for clustering, the absence of groups doesn’t mean they exist; and, the presence of groups doesn’t make them meaningful.

Cluster Analysis Flashcards

(19 cards)