Cluster Analysis Flashcards

Question 1

Q

Multivariate Statistics

Answer

A

Cluster = multivariate statistics
2 uses of multivariate statistics: dimensionality reduction and uncovering structure (unsupervised learning)

Question 2

Q

EFA Summary

Answer

A

Finds structure in a set of variables by processing correlations between them
Extracts factors to best represent the variables’ interrelations
Measurements provide a dimensional model

Question 3

Q

Cluster Analysis

Answer

A

Finds structure in data
Looks for categories
Uses information about similarity between objects rather than correlations between variables
Classifies variables into categories based on similarities

Question 4

Q

Grouping Data into Clusters

Answer

A

Given a set of points and a notion of distance between points, creates clusters
Members of clusters are similar and members of other clusters are dissimilar
Points in high-dimensional space
Similarity defined using a distance measure

Question 5

Q

Euclidean Distance

Answer

A

Length of line estimated by using dimesnion of x and y and squared differences
Very sensitive - affected by outliers

Question 6

Q

Hierarchical Clustering - Agglomerative

Answer

A

Bottom up
Initially each datapoint is a cluster and clusters are recursively combined
Combines the two nearest clusters into a large one and keeps going

Question 7

Q

Hierarchical Clustering - Divisive

Answer

A

Top down
Initially all datapoints are a single cluster and is recursively split into smaller ones
Assumers they belong to one big cluster then breaks it down into smaller clusters

Question 8

Q

Point Assignments

Answer

A

K-means
Maintain a set of clusters (k)
Points belong to ‘nearest’ cluster

Question 9

Q

Dendrogram

Answer

A

Tree-like diagram of relating points

Question 10

Q

Euclidean Space

Answer

A

Stopping combining clusters
- pick a number (k) upfront and stop when clusters = k
- stop when next merge will create low cohesion
Cohesion - cluster diameter, radius from centroid

Question 11

Q

K-Means Clustering

Answer

A

Point-assignment method
Preferable for very large datasets
Assumes Euclidean space
Things become more manageable

Question 12

Q

Method of k-means Clustering

Answer

A

Assigns the k centroids to points
Creates clusters by assigning points to the cluster whose centroids they are closest to
Centroids selected then data is assigned to each centroid they are closest to
Reassigned by taking averages
Data points reassigned until it reaches maximum number of iterations

Question 13

Q

Limitations of Clustering

Answer

A

Number
Objects to cluster - representative and random
Only include variables with good reason
Require interpretation
Validation of classification
Define number of clusters after research
Choose objects carefully
Unmeaningful data = unmeaningful clusters
Validation using other measures and attempt to see if they differ significantly from other measures