multi-dimensional scaling and Cluster analysis Flashcards
why are we covering both CA and MD techniques in one class?
because they are complimentary to eachother
can merge the two techniques and use them together
what is CA and MDS?
Two types of exploratory techniques
- help us to understand and locate structure and relationships in the data
- groups objects together based on their characteristics
- looks for patterns of information
whats the difference between factor analysis and cluster analysis/MDS
FA
- starts with individual variables and reduce these into dimensions of factors
- different ways to run factor analysis. -look at the correlation structure and try to reduce it using the factor loadings
- interpret where the dimensions are based on how individual variables load on these actors
cluster analysis/MDS
- start again with individual variables
- then determine which ones go together
difference - we don’t extract dimensions, instead just try to determine which variables in the dataset go together. this is something YOU do. you aren’t presented with extracted factors your’e only presented with patterns of how things might go together and then you decide which go together.
in which discipline would you use cluster analysis
Used in almost every discipline: psychology, neuroscience, biology etc.,
sometimes we need to sort variables together
the criteria we use to do the sorting will affect the outcome of the sorted variables
what is cluster analysis
Humans are good at identifying patterns - e.g., just looking ta the residual plot reveals a pattern
very difficult to identify patterns mathematically
CA provides you with information that you can use to identify what the patterns are. human-machine work together
what is a dissimilarity matrix
where the larger the number the more dissimilar our two objects (e.g., the di stance between two cities)
what is a similarity matrix and can you give an example of this
where the larger number indicates two objects are more similar e.g., a correlation table
what do we need to be aware of when running cluster analysis with regrads to the matrix
whether it is a dissimilarity or similarity matrix
what does cluster analysis actually do in terms of points of data
it puts points that are most similar together and pushes points most dissimilar apart
clusters things together
what different techniques are used to cluster things together
- k means clustering - non-hierarchical method. you decide in the beginning how many clusters you want. run it then get a suggested membership of data points to clusters
what is k means clustering?
a non-hierarchical clustering method
- we pick some starting cluster numbers - e.g., I want 3 clusters
- algorithm starts by randomly picking 3 cluster points in your data set
- at each step - clustering algorithm calculates the distance between each data point and the cluster center and assigns each datapoint membership to the cluster group nearest
- THEN - cluster center is moved by a certain algorithm - calculates whether this improved the distance measure between all data points and the cluster center
so the goal is to do iterative procedure to
- find the cluster center
- having the goal number (e.g., 3)
- and find the position of those cluster centers that will minimise the distance of all data points that could be assigned to that cluster
Explain what’s happening in this k means clustering shite
Well a cluster of 3 has been identified, the three points have been shifted 4 times to find the ideal location for data points closest to the cluster
with k means clustering what is shifted around the screen - the data points to find those fitting best to the clusters OR the cluster points moving until the data points are closest to it
The data points STAY PUT its the cluster point that shifts bit by bit and stops when the data points for the desired number of clusters are closest
with k means clustering, if the cluster centeroid shifts far enough, is it possible for data points to be assigned a different cluster membership
yes