MDS Flashcards
what is MDS
a set of data analysis techniques that display the structure in the data in pictures
pictures have 2 dimensions (x,y) MDS reduces dimensions of data to 2 dimensions. uses this to draw the picture
when is MDS most helpful
when you have data with many dimensions and you reduce these down to 2
use these 2 to group things together
how many dimmensions do I need to display the objects
- 1 less than however many objects/data points you have.
- n objects can have their relationships represented in n-1 dimmensions.
- 3 objects can be represented in 2 dimmensions while 4 objects need 3 dimmensions
Since humans can’t easily visualise relationships in higher dimensional space we simplify. 3 is what most people can do. anything above that gets tricky
MDS helps reduce that number of dimmension
discuss MDS protocol
takes initial matrix with n-1 dimmensions then twists it to reduce it to 2
it will go down in steps removing one at a time until it reaches 2 then creates a plot
data is a matrix of the relationships between different cities
can input this directly then perform the MDS analysis or you can enter the raw data and ask SPSS to generate your distance matrix for the analysis
what kind of matrix does the data need to be in for MDS analysis
dissimilarity matrix such that larger values indicate 2 things are more different
what is stress (MDS output)
matrix stress
a standardised summation of the difference between the distances in the data matrix and the corresponding distances between the objects on the plot
basically compares the distances you had originally to the 2D reduced version. measure of goodness of fit. tells us how well to 2D configuration of points, how well that matches the original configuration of points
- varies from 0-1
- the lower the value the better
- < 0.1 excellent ; > 0.15 unacceptable
- if it’s large it tells us the original higher-dimensional representation of the data is not represented well by your new 2D solution - so having taken out so many dimensions has a cost - the fit does not aqequatly represent the data
what is RSQ (MDS output)
proportion of explained variance
porportion of variance explained in the original higher dimensionalal space (the reference)
RSQ tells us how much of this variance is explained by the new 2D solution
- varies from 0-1
- the higher the value the better
multidimensional scaling using a Euclidean distance model
What super important meaning do the dimensions of x and y axis in the Euclidean distance model mean
Nothing!!! They are meaningless! And the orientation of the diagram is arbitrary
Extracts dimensions but wont’ attach any meaning to them. Does whatever works best to reduce the data into two dimensions. And to maximise RSQ and minimise the stress
2 dimensions used to represent the data - using origional rating/judgments of similarity
Scatter plot of linear fit also outputted in MDS – what is this
Plots the deviation/disparity. Think of this as a relationship between the original and fitted distance based on the 2d model. We want this to match very well. Objects that are very dissimilar in the original should be dissimilar in the new solution
Ideally want all the points to lie across the diagonal
if the stress is less than .1 and RSQ is .99 and scatter plot is diagonal. what can we say
There is an excellent fit between the higher dimensional original data and 2d solution
can say about 99% of the data in. the original distance space is explained for by the distances in the lower 2 dimensional space.
we see this also in teh scatter plot with most points going across the diagonal
what do we need to ask after seeing the 2D masterpiece MDS has produces
The points closest together - do they make sense?
yes.
key points for MDS
- Need to be aware of the data – needs to be dissimilarity matrix that goes into calculations
- Raw data can be converted into these, done fairly automatically in SPSS
- In terms of output: RSQ, stress and looking at scatter plot diagonals
- Again exploratory technique – no p value no final solution
- provided only with a plot - a visual representation of your data to show you any sensible way the data might be grouped together
- Just given a plot that can help you determine a sensible way to group objects