multi-dimensional scaling and Cluster analysis Flashcards

1
Q

why are we covering both CA and MD techniques in one class?

A

because they are complimentary to eachother

can merge the two techniques and use them together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is CA and MDS?

A

Two types of exploratory techniques

  • help us to understand and locate structure and relationships in the data
  • groups objects together based on their characteristics
  • looks for patterns of information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

whats the difference between factor analysis and cluster analysis/MDS

A

FA

  • starts with individual variables and reduce these into dimensions of factors
  • different ways to run factor analysis. -look at the correlation structure and try to reduce it using the factor loadings
  • interpret where the dimensions are based on how individual variables load on these actors

cluster analysis/MDS

  • start again with individual variables
  • then determine which ones go together

difference - we don’t extract dimensions, instead just try to determine which variables in the dataset go together. this is something YOU do. you aren’t presented with extracted factors your’e only presented with patterns of how things might go together and then you decide which go together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

in which discipline would you use cluster analysis

A

Used in almost every discipline: psychology, neuroscience, biology etc.,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sometimes we need to sort variables together

A

the criteria we use to do the sorting will affect the outcome of the sorted variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is cluster analysis

A

Humans are good at identifying patterns - e.g., just looking ta the residual plot reveals a pattern

very difficult to identify patterns mathematically

CA provides you with information that you can use to identify what the patterns are. human-machine work together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a dissimilarity matrix

A

where the larger the number the more dissimilar our two objects (e.g., the di stance between two cities)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a similarity matrix and can you give an example of this

A

where the larger number indicates two objects are more similar e.g., a correlation table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what do we need to be aware of when running cluster analysis with regrads to the matrix

A

whether it is a dissimilarity or similarity matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does cluster analysis actually do in terms of points of data

A

it puts points that are most similar together and pushes points most dissimilar apart

clusters things together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what different techniques are used to cluster things together

A
  • k means clustering - non-hierarchical method. you decide in the beginning how many clusters you want. run it then get a suggested membership of data points to clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is k means clustering?

A

a non-hierarchical clustering method

  • we pick some starting cluster numbers - e.g., I want 3 clusters
  • algorithm starts by randomly picking 3 cluster points in your data set
  • at each step - clustering algorithm calculates the distance between each data point and the cluster center and assigns each datapoint membership to the cluster group nearest
  • THEN - cluster center is moved by a certain algorithm - calculates whether this improved the distance measure between all data points and the cluster center

so the goal is to do iterative procedure to

  • find the cluster center
  • having the goal number (e.g., 3)
  • and find the position of those cluster centers that will minimise the distance of all data points that could be assigned to that cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain what’s happening in this k means clustering shite

A

Well a cluster of 3 has been identified, the three points have been shifted 4 times to find the ideal location for data points closest to the cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

with k means clustering what is shifted around the screen - the data points to find those fitting best to the clusters OR the cluster points moving until the data points are closest to it

A

The data points STAY PUT its the cluster point that shifts bit by bit and stops when the data points for the desired number of clusters are closest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

with k means clustering, if the cluster centeroid shifts far enough, is it possible for data points to be assigned a different cluster membership

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

when does k means clustering stop

A

when any further change in the cluster center doesn’t reduce the differences anymore.

17
Q

what does the p-value in cluster analysis tell us

A

there is no p-value nor end statistic of any sort. you are only presented with, e.g., for. k means clustering, a suggestion of cluster membership for different data points

18
Q

describe non-hierarchial cluster analysis

A

non-hierarchial methods

  • where clusters are formed by assigning membership to clusters
  • you decide how many clusters you want before the analysis e.g., k means cluster
  • individual data poitns are assigned one of them according to some particular criteria
19
Q

in non hierarchial cluster analysis how might you decide on the number of clusters?

A
  • have a certain theory
  • use previous literature - look the number they used
  • run it with varying numbers e.g., 2-5 then see which one gives the most reasonable cluster groups
20
Q

hierarchial method for cluster analysis: what are the two groups?

A
  • agglomorative method
  • divisive method

in any hierarchical method it goes from 1 to many clusters or many to 1. typically presented either a dendrogram or an icicle plot. then YOU determine themeaningful number of clusters is using a cut off.

in both cases you get a tree diagram (Dendrogram) and an icicle plot -helpful in deciding a feasable cut off point

21
Q

hierarchial cluster analysis: agglomerative methods

A

different types: single link (nearest neighbour), maximum link (furthest neighbour) or average link (centeroid clustering) - they differin the way they compute the distances

  • start by treating each data point as a one-member cluster
  • then proceed to put things together - agglomerate clusters
  • once a pair of object shave been put together - cant split them up again
  • means new clusters are formed based on clusters already created at a previous step
22
Q

hierarchical cluster analysis: devisive methods

A
  • treat all data points as one giant cluster
  • then split things up - once a pair has been seperated they can never join again
23
Q

what is the single link aka nearest neighbour technique

A

one method of hierarchial agglomerative clustering

  • start with each city by itself
  • then start amalgamating them
  • looks at the data finds the points with the closest relationship to each other (durham-subderland) and group these together in a cluster
  • then distance matrix recalculated and finds the cities that are next in line closest together (including the Durham-Sunderland cluster as one)
24
Q

single link aka nearest neighbour technique

look at the dendrogram and name are the cluster groups in order

A
  1. durham and sunderland
  2. exeter and plymouth
  3. birmingham + (Exeter + plymouth)

horizontal axis is a measure. of the relative proximity of the variables e.g., relationship between Durham and sunderland is closer than the relationship between exeter and plymouth. knowing the relative distance between cities can help you to create a cut off point (e.g., cut off point at about 3 in the x axis number line would give. usonly 1 cluster, nut if it was at 24 we would have 3

25
Q

single link aka nearest neighbour technique

so after Durham and Sunderland form 1 cluster, SPSS recalibrates and computes a NEW dissimilarity matrix. how does it do this?

when we compare other cities e.g., Exeter to these two cities which distance is used in the dissimilarity matrix.

A

whichever gives us the smaller value in this case the Durham-Exeter distance. Exeter is the closest link.

This is the matrix used to make the second clustering decision - and we see the smallest value in this table is the exeter-plymouth link

26
Q

single link aka nearest neighbour technique

lets say in our dissimilarity matrix were comparing the distance between a cluster of 2 cities anda nother cluster of 2 cities - how do we decde the distance value to use in the dissimilarity matrix?

A

(A, B) vs (1,2)

compare a with 1 and 2, B with 1 and 2, compare 1 with a and b, compare. 2 with a and b

whichever of these gives us the smalles value we use that in the matrix

keep giong until last 2 clusters form 1

27
Q

Single link aka nearest neighbour technique

looking at the dendrogram how do we now decide how many clusters to have

A

is it 2 or 3? its a judgment call YOU make the decision

28
Q

Maximum link (furthest neighbour)

A
  • again, durham and sunderland will be the first cluster - bc distance was smallest (19)
  • but then the distance computed between this cluster and the other cities // between clusters will use the largest distance
  • the SMALLEST value in the matrix is still used to determine the next cluster
29
Q

Average link (Centroid clustering)

A

Still assigns the two points the smallest distance from on another together – but distances within the table are based on the average between the objects in the table.

30
Q

What’s the difference between single and maximum link hierarchical agglomerative method?

A

Single link method tends to produce more “chaining” while the maximum link method creates several tightly defined clusters

31
Q

combiing objects and using this as a value in the matrix. why is this bad

A

SCALE EFFECTS!!!

because the distance matrix is based on the combined scores? whichever variable is bigger (e.g., percent on a maths test > height in meters) will dominate

So if you were going to run a distance matrix you would have to account for this scalling issue

32
Q

What are some scaling issues

A
  • you need to account for any scaling issues when comparing the distance between objects in a dissimilarity matrix
  • when similar data are rescaled – e.g., scores on a test – one out of 50 and another out of 75. The raw scores might join child a and b but the percentage scores join child b and c into one cluster

all this is because we use the Euclidean distanceas a measure of proximity. When we combine scores/rescale scores this measure does not maintain the rank ordering you might have in each variable.

33
Q

scaling effects - problem with euclidean distance it doesn’t maintain the rank order

how can we fix this?

A
  • rescale data – z transformation so you basically rescale all scores so they have a mean of 0 and SD of 1. Buts all variables on the same scale so when combining them theyre all weighed equally.
  • In SPSS different ways to rescale data – basically just try to put everthing on the same scale
  • Which way to chose – depends on your data, if all of them are equally important then z transformation is the way to go
  • But if the raw data is meaningful by itself and its not so important he rank ordering is maintained then you might not want to do any transformation
34
Q

With what type of data is it meaningful to compute Euclidean distance. With what is it not?

A

Interval/ratio scale data. Wouldn’t be meaningful with binary data

35
Q

How can we do a cluster analysis on binary data

A

Transform the counts into some measure – now can be subject to clustering

  • In SPSS different way to re-jigg this data (a,b,c,d) to get a measure of the similarity between x and y
  • Measures differ in the way it thinks the absence of a feature is more important than the presence of a feature vice versa
36
Q

Different ways to judge the similarity between 2 binary variables

A

Simple matching similarity measure (poss most common) – a + d / a + b + c + d

  • Basically it’s the total number of matches divided by the total number of measures

Jaccard similarity measure or similarity ratio – a / a+ b + c

  • Basically same as SMS but with the double negatives removed

Phi

  • Binary form of the pearsons product-moment coefficient
37
Q

After looking at the dendrogram of these birds how can we decide how to cluster them

A

Go back to your data – ask questions

  • Are they woodland/farmland animals
  • Are they going up or down in abundance
  • Is any species particularly different from the rest

Dendogram doesn’t answer these qs!! Just gives you close where in your data you should look

38
Q

What do you need to be aware of when conducting cluster analysis

A
  • Similarity/dissimilarity matrix - do larger values mean data is more or less similar?
  • What decisions to make about the criteria used to cluster the objects (mierarchial max link)
  • Type of data you have (interval/ratio/binary) i.e., if you have binary data you may have to transform that into another measure first like simple matching similarity/phi
  • Different techniques provide different solutions!
39
Q

What is Euclidean matrix

A

Dissimilarity matrix