multi-dimensional scaling and Cluster analysis Flashcards

Question

single link aka nearest neighbour technique so after Durham and Sunderland form 1 cluster, SPSS recalibrates and computes a NEW dissimilarity matrix. how does it do this? when we compare other cities e.g., Exeter to these two cities which distance is used in the dissimilarity matrix.

Answer 1

whichever gives us the smaller value in this case the Durham-Exeter distance. Exeter is the closest link. This is the matrix used to make the second clustering decision - and we see the smallest value in this table is the exeter-plymouth link

Answer 2

(A, B) vs (1,2) compare a with 1 and 2, B with 1 and 2, compare 1 with a and b, compare. 2 with a and b whichever of these gives us the smalles value we use that in the matrix keep giong until last 2 clusters form 1

Answer 3

is it 2 or 3? its a judgment call YOU make the decision

Answer 4

* again, durham and sunderland will be the first cluster - bc distance was smallest (19) * but then the distance computed between this cluster and the other cities // between clusters will use the largest distance * the SMALLEST value in the matrix is still used to determine the next cluster

Answer 5

Still assigns the two points the smallest distance from on another together – but distances within the table are based on the average between the objects in the table.

Answer 6

Single link method tends to produce more “chaining” while the maximum link method creates several tightly defined clusters

Answer 7

SCALE EFFECTS!!! because the distance matrix is based on the combined scores? whichever variable is bigger (e.g., percent on a maths test \> height in meters) will dominate So if you were going to run a distance matrix you would have to account for this scalling issue

Answer 8

* you need to account for any scaling issues when comparing the distance between objects in a dissimilarity matrix * when similar data are rescaled – e.g., scores on a test – one out of 50 and another out of 75. The raw scores might join child a and b but the percentage scores join child b and c into one cluster all this is because we use the Euclidean distanceas a measure of proximity. When we combine scores/rescale scores this measure does not maintain the rank ordering you might have in each variable.

Answer 9

* rescale data – z transformation so you basically rescale all scores so they have a mean of 0 and SD of 1. Buts all variables on the same scale so when combining them theyre all weighed equally. * In SPSS different ways to rescale data – basically just try to put everthing on the same scale * Which way to chose – depends on your data, if all of them are equally important then z transformation is the way to go * But if the raw data is meaningful by itself and its not so important he rank ordering is maintained then you might not want to do any transformation

Answer 10

Interval/ratio scale data. Wouldn’t be meaningful with binary data

Answer 11

Transform the counts into some measure – now can be subject to clustering * In SPSS different way to re-jigg this data (a,b,c,d) to get a measure of the similarity between x and y * Measures differ in the way it thinks the absence of a feature is more important than the presence of a feature vice versa

Answer 12

Simple matching similarity measure (poss most common) – a + d / a + b + c + d * Basically it’s the total number of matches divided by the total number of measures Jaccard similarity measure or similarity ratio – a / a+ b + c * Basically same as SMS but with the double negatives removed Phi * Binary form of the pearsons product-moment coefficient

Answer 13

Go back to your data – ask questions * Are they woodland/farmland animals * Are they going up or down in abundance * Is any species particularly different from the rest Dendogram doesn’t answer these qs!! Just gives you close where in your data you should look

Answer 14

* Similarity/dissimilarity matrix - do larger values mean data is more or less similar? * What decisions to make about the criteria used to cluster the objects (mierarchial max link) * Type of data you have (interval/ratio/binary) i.e., if you have binary data you may have to transform that into another measure first like simple matching similarity/phi * Different techniques provide different solutions!

Answer 15

Dissimilarity matrix

multi-dimensional scaling and Cluster analysis Flashcards

(39 cards)